news February 23, 2025 · Robotics Hub

Helix: Revolutionizing Humanoid Robotics with Vision-Language-Action Integration

Figure AI's Helix model fuses vision, language and action into a single neural network, letting a humanoid robot understand spoken requests and manipulate objects it has never seen before.

Figure AI introduced Helix, a Vision-Language-Action (VLA) model that runs entirely on a humanoid robot and turns natural-language requests into real-world manipulation — no task-specific programming required.

Why Helix matters

Traditional industrial robots are scripted: every motion is taught or programmed in advance. Helix points at a different paradigm — a single neural network that takes in camera images and a spoken instruction, then outputs continuous control for the whole upper body at high frequency.

One model, two systems. A slower “thinking” system reasons about the scene and the request; a fast control system generates smooth, high-rate motion.
Generalization. Helix can pick up thousands of household items it was never explicitly trained on, simply by being asked.
On-board inference. The model runs on embedded GPUs in the robot itself, not in the cloud.

The industrial read-through

For factory automation, VLA models hint at a future where reconfiguring a cell is a matter of asking rather than reprogramming. That is still a research frontier — but the gap between collaborative robots and general-purpose humanoids is closing faster than most automation roadmaps assumed.