Industry Insights

Google DeepMind Debuts Gemini Robotics On-Device Visual Language Model

By Brian Heater, Managing Editor A3

06/24/2025

2 minutes

Gemini Robotics On-Device is more or less what it says on the tin. The new visual language model (VLM) from DeepMind is designed to run locally on robotics, utilizing on-board processing where possible. Such functionality means the system doesn’t require a constant connection to function.

In a blog post Tuesdsay, DeepMind Senior Director Carolina Parada says the new, more efficient model, “shows strong general-purpose dexterity and task generalization.” The program is designed specifically for “bi-arm” robots. The category encompasses most of what we would refer to as “humanoid,” while accommodating form factors outside the standard bipedal bot.

The team has utilized both Apptronik’s Apollo humanoid and the Franka Research 3, a force-sensitive system with a pair of industrial arms. The new model is a decrease robot response time, as systems are nudged closer to something we might deem ‘general purpose’ functionality.

In the examples given by Parada, the manipulators utilize vision data to perform several manipulation tasks that require a high level of dexterity/precision. That includes household tasks like folding laundry and unzipping plastic bags, along with industrial jobs, including belt assembly, which have previously required highly specialized systems.

The On-Device model delivers newfound developer customization, as well. “While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications,” says Parada. “Our model quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.”

Google says the model can also get robots like Apollo to follow natural language instructions and manipulate objects it hasn’t already trained on.