DeepMind demo SIMA, a generalist AI agent for 3D environments

March 14, 2024

●

Deepmind Demo Sima, A Generalist Ai Agent For 3d Environments

Share this post on X Share this post on LinkedIn Share this post on Facebook

Imagine an AI that doesn’t just understand commands but gets them, like a human would, across an array of simulated 3D environments.

That’s the aim of DeepMind’s (Scalable, Instructable, Multiworld Agent (SIMA).

Unlike traditional AI, which might excel in discrete tasks like strategic games or specific problem-solving, SIMA’s agents are trained to interpret human language instructions and translate them into concrete actions using a keyboard and mouse, mimicking human interaction with a computer.

This means that whether the task is to navigate through a digital landscape, solve puzzles, or interact with objects in a game, SIMA aims to understand and execute these commands with the same intuition and adaptability as a person would.

This project’s core is a huge and diverse dataset of human gameplay across research environments and commercial video games.

SIMA was trained and tested on a selection of nine video games through collaborations with eight game studios, including well-known titles like No Man’s Sky and Teardown. Each game challenges SIMA with different skills, from basic navigation and resource gathering to more complex activities like crafting and spaceship piloting.

SIMA’s training included four research environments to assess its physical interaction and object manipulation skills.

In terms of architecture, SIMA uses pre-trained vision and video prediction models, fine-tuned on the specific 3D settings of its game portfolio.

Unlike traditional game-playing AIs, SIMA doesn’t require source code access or custom APIs. It operates on-screen images and user-provided instructions, employing keyboard and mouse actions to execute tasks.

In its evaluation phase, SIMA demonstrated proficiency across 600 basic skills encompassing navigation, object interaction, and menu use.

What sets SIMA apart is its focus on generality. This AI isn’t being trained to master a single game or solve a particular set of problems. Instead, DeepMind is teaching it to be adaptable, to understand instructions, and to act on them across different virtual worlds.

SIMA is mastering the art of understanding and acting upon our instructions by grounding language in perception and action.

This project represents a move closer to the dream of truly intelligent, instructable AI agents, blurring the lines between human and machine understanding.