Google DeepMind Debuts Gemini Robotics On-Device Visual Language Model

By Brian Heater, Managing Editor A3
06/24/2025
2 minutes

Gemini Robotics On-Device is more or less what it says on the tin. The new visual language model (VLM) from DeepMind is designed to run locally on robotics, utilizing on-board processing where possible. Such functionality means the system doesn’t require a constant connection to function.

In a blog post Tuesdsay, DeepMind Senior Director Carolina Parada says the new, more efficient model, “shows strong general-purpose dexterity and task generalization.” The program is designed specifically for “bi-arm” robots. The category encompasses most of what we would refer to as “humanoid,” while accommodating form factors outside the standard bipedal bot.

The team has utilized both Apptronik’s Apollo humanoid and the Franka Research 3, a force-sensitive system with a pair of industrial arms. The new model is a decrease robot response time, as systems are nudged closer to something we might deem ‘general purpose’ functionality.

Get the Training You Need for a Safer Workplace!

Autonomous mobile robots are one of the fastest-growing segments of the robotics industry. During this live virtual training, you'll be introduced to safety protocols and best practices for working with mobile robots in industrial settings. 

Learn more and register now for upcoming training dates.

 

In the examples given by Parada, the manipulators utilize vision data to perform several manipulation tasks that require a high level of dexterity/precision. That includes household tasks like folding laundry and unzipping plastic bags, along with industrial jobs, including belt assembly, which have previously required highly specialized systems.

 The On-Device model delivers newfound developer customization, as well. “While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications,” says Parada. “Our model quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.”

Google says the model can also get robots like Apollo to follow natural language instructions and manipulate objects it hasn’t already trained on.

MEET THE AUTHOR

Association for Advancing Automation

Discover how Association for Advancing Automation can support your automation journey with their complete range of solutions and expertise.

Visit Company Website
Collaborative Robots This content is part of the Collaborative Robots curated collection. To learn more about Collaborative Robots, click here.