Gemini ‘Omni’ Will Generate Media From Any Input, Starting With Video

There are a flurry of AI-related announcements coming out of Google I/O 2026 today, but perhaps the most impressive is a new multimodal model called Gemini Omni. While it’s launching as a video generator to begin with, it’ll eventually be able to incorporate images and audio too, on both the input and output side.

The idea is you can remix different audio, images, and video into a completely new clip, via a custom prompt. Right now, you can only generate videos from text prompts and images within Gemini, so you’re getting the added ability to combine audio clips and existing videos too when generating something new—multiple sources for input, and then an output that Google promises is better than ever in terms of realism and accuracy.

While image and audio generation is on the way, the ability to create videos is coming first, with a model called Gemini Omni Flash. The example Google gives is picking a few styles from images in your phone’s gallery, and then applying them to an existing video: So if you wanted to, you could make a video of you in the real world look like a Pixar animation.

Gemini Omni
Omni lets you combine videos, images, and audio into new clips.
Credit: Google

You can also edit your videos through “conversation,” says Google. That conversation aspect will be familiar to anyone who already uses Gemini to make videos: You just explain what it is you want to see, and Omni takes care of it. You can use follow-up prompts to change something specific about the video, like an object or color, or to create your very own reshoots of the scene where the action changes.

You can also change the angle or the environment of a video—transporting yourself from a bedroom to a beach scene, perhaps. Google says you can take multiple turns to refine your videos, while still being able to get back to the original clip.

Gemini’s world knowledge

Google says Gemini Omni uses “an intuitive understanding of physics” together with “Gemini’s knowledge of history, science, and cultural context” to make videos as realistic and as consistent as possible—though I’ll have to try this out for myself to see if this all works as well as Google says it will.

Omni now comes with a better understanding of forces like gravity, kinetic energy, and fluid dynamics, so there should be less AI weirdness on show. As well as building scenes, Google says, Gemini Omni reasons about what should happen next.

AI videos can often collapse because they’re trying to follow patterns from the vast number of videos in their training data, rather than follow the laws of physics. If a person disappears off-camera, they won’t necessarily still be there when the camera pans back. Google claims Gemini Omni will show fewer issues like this.

Gemini Omni
You’ll need to be signed up for a Google AI subscription to use Omni.
Credit: Google

To protect against deepfakes, Google is putting some limits on video creation. For now, you’ll only be able to use your own voice and a digital avatar based on you to generate outputs. In addition, all videos will carry Google’s invisible SynthID watermark that indicates the content is AI-generated.

Gemini Omni Flash is rolling out now in the Gemini app and Google Flow, for Google AI Plus, Pro, and Ultra subscribers. It’s also going to be available for free in YouTube Shorts and the YouTube Create app later this week.

At the time of writing, there’s no word on usage limits. At the moment, those on a Google AI Plus plan ($7.99 a month) can generate two videos a day using the Veo 3.1 Lite model. It remains to be seen how generous Google is with Gemini Omni generations—it looks like they take up a fair amount of AI processing power.

Need help?

Don't hesitate to reach out to us regarding a project, custom development, or any general inquiries.
We're here to assist you.

Get in touch