I recently had the opportunity to try out Google's latest leap in AI video technology, Veo 3. It was released on May 20, 2025, during their I/O 2025 event.
Its cool features have made a lot of buzz. And as someone who works in AI video generation, I'm eager to share my honest thoughts and experiences with this model.

TL;DR
I tested Veo 3 by generating a podcast and several concert videos. While it’s not perfect and has some weird glitches, it’s impressive how well it creates realistic videos and synced audio.
But I was also frustrated by this model sometimes: the audio and caption generation is not fully controllable, and the texts it generated are frequently messed up.
Pros | Cons |
✔️ Generates video and audio in one go | ❌ Very expensive subscription plan |
✔️ Realistic lip-syncing and sound effects | ❌ Uncontrolled audio & caption generation |
✔️ High-quality visuals with good physics | ❌ Frequent visible quirks and jumbled text |
✔️ Integrated into Google Flow |
Video, Audio, Voiceover, Lip Sync, … All in One Go
The first thing I noticed was how Veo 3's streamline multiple video creation steps into one simple process.
When I use Veo 3, I'm amazed at how it brings my videos to life with sound. I can add ambient noises like birds singing in the trees or the bustling sounds of city streets, which really make my scenes feel authentic.
What impresses me most is how it can create dialog that matches the characters' lip movements - it's so natural that sometimes it doesn't look like it's AI-generated.
AI Video Workflow Redefined
This new multimodal ability is definitely one of the major highlights of this model. You no longer have to mess around with music or find voiceovers and lip sync separately.
This can change how people produce videos with AI:
- The old workflow: Generate videos > generate voiceover/sound effects/music > lip sync > editing.
- The new workflow with Veo 3: Just input a text prompt, and all is taken care of.
Generation Examples
I was eager to see how Veo 3 would be able to handle certain trending video requests, so I asked it to generate four unique videos.
In the first example, I requested an authentic-looking fake weather news anchor announcement describing an invasion of tacos quickly making its way into the United States.
I was surprised at how semi-realistic the footage was. While there were a few facial distortions, the announcer looked lifelike with fairly accurate lip syncing.
The next example was for a novelty video of a realistic-looking, talking Gorilla attending a big English football match, as he holds up a selfie stick and angrily rants to his viewers about an unfair call by the referee while in the stands with other fans.
This was a funny result, as the gorilla looked and sounded unbelievably lifelike with natural expressions and body movements. But there were several background distortions that were still noticeable.
For the third example, I wanted to see what it would look like inside vegetables if they were cut in half.
For the most part, my prompt was followed accurately, but for some reason, the tool rendered crystallized vegetables, which compromised the visual realism I was going for. The sound is relaxing btw.
In this final example, I asked Veo 3 to produce a time travel cinematic sequence of a woman who travels back to April 14, 1912, and attempts to warn the passengers aboard the Titanic of the ship's sinking in the North Atlantic Ocean, near Newfoundland.
This time, I found the scene was over exaggerated and the sudden disappearing sequence unnecessary. Frankly, it was quite a random and inaccurate AI video render.
All in all, Veo 3 did fairly okay for the most part. It had a few glitches in its prompt adherence and visual consistency. So, some regeneration may be needed from time to time, but I think that this AI model has the capacity to generate viral-ready videos.
Flow: A Sneak Peak of Next-Gen AI Video Production
Google released Flow alongside Veo 3. It's an AI video storyboard platform that integrates Veo 3 and its preceding video models, and plenty of AI generation and editing tools.

Storyboard
The storyboard concept isn't new. Sora introduced it, but it was overlooked due to the poor performance. Google Flow takes the storyboard concept and makes it much more useful.
You can place any clips you uploaded or generated by the Veo family of models onto a timeline, arrange them, trim them, and perform basic editing. But the coolest feature is what they call "extension".
Smooth Video Extension
Here's how it works: you take an 8-second video and can use any frame as a starting point to generate new animation that continues from that moment.

What's amazing is how smooth the transition is between the original and new content. Flow seems to analyze the motion trends in the original video rather than just using a single image as reference.
This extension feature is really important because it breaks past the typical length limits of AI-generated videos. Instead of being stuck with short clips, you can now create longer, more narrative videos.
It's similar to what Sora promised, but Google's implementation actually works well enough to be useful.
But one thing to note: right now, the extension feature only works with Veo 2, not the newer Veo 3.
Impressive, But Also Inconsistent
I was pretty excited about the cool features and stunning videos Veo 3 can deliver. But as I explored further, I also noticed that some videos I generated had quality issues.
Uncontrolled Audio & Captions Generations
One thing that really bugged me was how random the audio and caption generation felt. You can't control whether they'd appear or not even if you specify it in your prompt.
One example is this video generated with this prompt: The 20 year old girl was very distressed and said, "What's wrong? An essay I wrote myself was determined to be AI-generated?" The girl has a hand on her head, an anxious expression, no captions.
I specifically asked for the girl saying something and no caption in my prompt. The video came out completely silent, but with captions.
And in this TikTok video example of promoting a toothbrush, you can also hear no sound.
Quirks & Glitches
I also noticed some glitches in the Veo 3-generated videos. This includes awkward movements or visual glitches that just didn't make sense,
For example, I tried to create a laptop unboxing video. Instead of showing someone actually opening the box and taking out the laptop, the cardboard box itself morphed directly into a laptop!
It's disappointing to see that these issues from Veo 2 still linger in the new version.
Also, I think the overall sound quality still needs refinement, and some of the sound effects generated sound odd. These were minor, but still noticeable when I heard them closely.
Jumbled Text
Another problem was text generation quality. Veo 3 can generate captions for videos, but the text frequently came out jumbled and full of misspellings.
You can see this issue in the previous examples. And here are more examples to show you how frequent it can be.



I understand this is a common issue across many AI models. But as mentioned, you can't have full control over their appearance. So you may need to try generating a few more times to avoid this issue.
Pricey Access
Another downside I found is that Veo 3 is super pricey. It’s only available to users subscribed to Google’s Ultra plan, which costs $249.99 per month.
That’s a steep price. If you're just a casual user or small creator who might want to experiment with this model, then I don't think this is for you. Hopefully, Google will expand access or offer more affordable options in the future.
Unable to Use Veo 3? Try Pollo AI!
If you're looking for a high-quality AI video generator, but can't afford to try Veo 3, just take a look at Pollo AI!
Pollo AI is a powerful all-in-one AI video and image generator that allows you to try all the best video models in one place. As an official partner with Google Cloud, you can now try Veo 3 on Pollo AI!

In addition to Veo 3, you can also experience the capabilities of Runway, Vidu, Hailuo, Kling, PixVerse, …, all the advanced models you need to create high quality videos.
What's more, Pollo AI offers a wide range of video tools to cover all your video creation needs.
For example, you can try its image to video, text to video, consistent character video, video to video generators, and multiple AI video effects to create all kinds of fun and creative AI videos.
Final Thoughts
As someone who has tried most AI video generation tools, I'm really excited about the high quality Veo 3 delivered.
The natural audio integration, realistic details, and the streamlined video creation process, these are all the cool features that really impress me.
On the other hand, the price limits its reach, and there’s still room for improvement in generation quality and consistency.
That said, Veo 3 still gives me a fascinating glimpse into where AI video tech is headed, and I’m curious to see how Google and other companies build on this foundation.
And also, if you're looking for an all-in-one AI video generation platform, I suggest you give Pollo AI a try!