Zero-Shot Terrain Context Identification and Friction Estimation

Combining foundation models with clustering for zero-shot terrain identification.

Vehicles with incorrect friction estimates can lose control and even flip over.
Our approach, PC-VFE, is able to adapt to new terrain.
Click to zoom in on the poster! A high-resolution PDF version is available here

Autonomous Vehicles Must Adapt On the Fly

Learning Terrain Types Without Supervision

  1. Continuous Multi-Modal Sensing
    The vehicle records images, state, and control data at each timestep, building a rich dataset during real operation.

  2. Extracting Meaningful Semantic Features
    A Vision-Language Model (e.g., CLIP) translates images into semantic latent vectors by matching them to text-based queries. These are enhanced with basic visual features (like brightness and color) for additional context.

  3. Unsupervised Clustering for Terrain Discovery
    Latent features are automatically grouped into clusters—each representing a unique, discovered terrain type – without manual labeling or prior knowledge.

  4. Physics-Informed Optimization for Actionable Parameters
    For each terrain cluster, a gradient-based optimizer fine-tunes friction and related parameters, using a differentiable vehicle dynamics model to backpropagate losses from observed driving behavior.

Our Approach can be Deployed on Real Hardware

Real-Time Operation is Practical

Effective Control Even with Imperfect Estimation

Next Steps: Toward Adaptive Off-Road Autonomy

Takeaway:
Our method demonstrates that vision-language foundation models, combined with physics-informed optimization, empower autonomous vehicles to adapt to unseen terrain in real time—without human supervision or advance mapping—unlocking new possibilities for robust off-road autonomy.