Figure’s mission centers on integrating humanoid robots into the workforce, focusing on logistics package manipulation as a primary application. They have introduced Helix, their Vision-Language-Action (VLA) model, which enhances low-level visuo-motor control (System 1, S1) with several key improvements:
- Implicit Stereo Vision:Â Enhances depth perception for precise motion control.
- Multi-scale Visual Representation:Â Captures fine details while maintaining overall scene understanding for better manipulation accuracy.
- Learned Visual Proprioception:Â Enables seamless calibration across robots.
- Sport Mode:Â Increases execution speed beyond human demonstrators while retaining dexterity.
Effective package handling and sorting on conveyor belts requires adaptability to various package types. These improvements not only address this complex challenge but also benefit other use cases.
Key Architectural Enhancements
- Visual Representation:Â Transitioned to a stereo vision backbone and multiscale feature extraction, improving spatial understanding and control.
- Cross-Robot Transfer:Â Implemented real-time visual calibration for consistent policy application across diverse hardware, mitigating manual calibration challenges.
- Data Curation:Â Focused on high-quality demonstrations, leading to significant improvements in performance despite using less data.
- Inference Speedup:Â “Sport Mode” allows for a 20%-50% speed increase in manipulation tasks by optimally resampling action outputs.
Results
- Performance Measurement: The system utilizes normalized effective throughput (T_eff) to gauge performance, highlighting a greater than 10% speed increase over training data in optimized conditions.
- Stereo Impact: Added stereo capabilities improved throughput by 60% and enhanced robustness to varying package sizes.
- Quality Over Quantity: High-quality, curated data resulted in a 40% throughput increase with â…“ less data compared to raw input.
- Sport Mode Efficacy: Achieves a meaningful uptick in speed, but excessive speed increases lead to decreased accuracy.
- Cross-Robot Transfer Success: The system maintained consistent performance across multiple robot platforms, demonstrating the effectiveness of learned calibration.