Home/Blog/Reviews/Kling O1 Image Model Review: Can Kling's First AI Image Generator Match Its Video Legacy?

Kling O1 Image Model Review: Can Kling's First AI Image Generator Match Its Video Legacy?

Kling AI has made a name for itself as one of the most powerful AI video generators on the market, consistently impressing creators with its ability to produce high-quality footage from simple text prompts.

Now, they're trying something new with the launch of the Kling O1 image model, their first dedicated image generation model. The big question is: can Kling do images as well as it does video?

I've spent time testing the Kling O1 image model extensively, and I'm here to share what I found. Let's see what this new model brings to the table.

What Makes Kling O1 Stand Out?

Before diving into my detailed tests, let me give you a quick preview of what I found to be Kling O1's most impressive features:

Amazing Multi-Image Fusion That Keeps Original Details

One of the Kling O1 image model's biggest strengths is its ability to combine multiple reference images while keeping the original features of each source incredibly well-preserved.

Unlike many other models that blur details when mixing multiple images, Kling O1 maintains the distinct characteristics of each element with impressive accuracy.

Smart Prompt Understanding & Precise Editing

Whether adjusting specific areas of an image or modifying particular elements, the model accurately understands editing instructions.

Built on the powerful concept of Multi-modal Visual Language (MVL), it makes image editing feel as natural as conversing with a designer.

My Testing Process: Pushing Kling O1 Image Model to Its Limits

To properly evaluate Kling O1's capabilities, I focused on two primary testing scenarios that would expose both its strengths and potential weaknesses:

Test 1: Multi-Image Reference Fusion

The first test aimed to assess how well Kling O1 could handle multiple reference images simultaneously and create a cohesive composition that preserves the characteristics of each source.

I used four images:

Reference Images Used:
Young girl in pink dress smiling at playground
Cocker spaniel dog sitting on green grass outdoors
Empty birthday party room with colorful balloons and decorations
Group of friends with surfboard celebrating at beach sunset

Then I provided the following prompt:

Please generate an image featuring the girl from Image 1 holding the dog from Image 2, with the background of Image 3, and applying the color tone and style of Image 4 to the entire photo.

And here’s the result I got:

Birthday party room decorated with balloons and happy birthday sign

From the generated results, it is evident that Kling O1 perfectly followed the instructions, even while processing content from four images simultaneously. There was no chaos or deviation from the prompt, and its powerful multi-image processing capability truly surprised me.

However, I believe the realism of this photo could be further improved. Although the subject and background share the same color tone, there is still a somewhat discordant and unnatural feel.

Beyond blending scenes and subjects, I also tested the application of style and material.

I used these two images:

Character Subject Reference Target Material Texture
Anime girl illustration with scarf near cherry blossom trees.
Black and white houndstooth pattern fabric texture closeup

And set the prompt as:

Convert the subject of Image 1 into a photorealistic person, using the texture and material from Image 2 for the scarf.

The final image that Kling O1 gave:

Smiling woman wearing checkered scarf outdoors in spring blossoms

The final result demonstrates that Kling O1 performs quite well in terms of style transformation and material replacement.

Yet, there are some minor issues: an extra portion of the scarf appearing on the subject's chest and the disappearance of the bow tie. These kinds of logical inconsistencies in the image undermine its overall realism.

Test 2: Iterative Precision Editing

The second test focused on evaluating Kling O1's capacity for precise, incremental modifications based on a single reference image.

This would reveal whether the model could handle complex editing workflows without degrading quality or losing context.

In the table I've compiled below, you can see a side-by-side comparison of Kling O1's performance on this test:

Ref Image Prompt & Result 1 Prompt & Result 2
Woman holding coffee cup sitting by cafe window indoors
Woman sitting at yellow tone cafe bar counter inside

Change the time of day to evening, with warm interior lighting from overhead lamps. Keep everything else unchanged.

Woman relaxing with coffee at window seat in cafe

Replace the coffee cup with a book. The woman should now be reading instead of looking out the window. Maintain the same facial features, clothing, and background.

Prompt & Result 3 Prompt & Result 4 Prompt & Result 5
Woman drinking coffee by window in crowded urban cafe

Add light rain visible through the window. Adjust the window reflection to show the rain droplets. Do not modify the interior scene or the character.

Woman with coffee sitting in busy cafe by window

Change her casual attire to business professional clothing—a blazer and formal blouse. Keep her pose, facial features, and the entire background scene identical.

Woman holding coffee sitting at cafe window with people

Add another person in the background—a barista working behind the counter. Maintain the same lighting, time of day, and all other existing elements.

The results were genuinely impressive. Kling O1 demonstrated an exceptional understanding of what should change and what should remain constant.

Each iteration maintained remarkable consistency with previous versions while accurately implementing the requested modifications.

Final Thoughts: Is Kling O1 Worth Trying?

After extensive testing, the Kling O1 image model is clearly a strong entry into the AI image generation space.

The multi-modal approach works great—combining natural language prompts with reference images creates a smooth workflow that feels collaborative rather than frustrating.

The feature retention is genuinely best-in-class, keeping the distinct characteristics of each source when combining multiple references. The step-by-step editing is also remarkably efficient, letting you make precise changes without losing context.

For creators and designers wanting excellent control and consistency, the Kling O1 image model is definitely worth trying. It successfully brings Kling's video expertise into still images while eliminating the annoying tool-switching that plagues many AI creative processes.

Is it perfect? No. But it's a strong debut that shows Kling AI is serious about image generation.

Ready to test it yourself? Head over to Pollo AI to try the Kling O1 image model, or explore other premier models available on the Pollo AI image generator to find the one that best suits your needs. It's an investment of time well worth making for any creator.

You might also like

View more

Sora 2 Review: I Tested OpenAI’s New AI Video Model – The Results Are Unreal

Explore OpenAI's Sora 2 AI video generator in this hands-on review. See why Sora 2 is revolutionizing creativity. Tested and proven!

Nano Banana Pro Review: Does This AI Image Generator Live Up to the Hype?

Does Google Nano Banana Pro live up to the hype? Our honest review dives into the Nano Banana Pro AI image model, testing its superior text rendering, creative controls, and more to see if it truly delivers.

Hailuo 2.3 Review: I Tested Hailuo AI’s Latest AI Video Model & I’m Positively Surprised By It

Discover Hailuo 2.3 on Pollo AI! Explore this in-depth review that dives into Hailuo AI’s latest AI video model and the new advancements it brings to AI video creation!

Kling O1 Review: I Tested Kling O1 AI Video Model, And It Might Be the Future of AI Video

I tested Kling AI’s O1 video model—a unified multi-modal AI that creates and edits videos seamlessly. Discover why Kling O1 video model might be the future of AI video.