I tested the new generation of images with GPT-4O and it's another world

Well, I’m happy, OPENAI Finally got out of the fingers to offer us a generation of image generation system that holds water. So of course, we are still far from the “photo” quality of Midjourney or 1.1 pro flow but to make logos or diagrams, it looks great! Especially since he knows how to write almost without error. Too cool !!

These typed Dall-E 3 images with hieroglyphs as a text, it’s finally over! Youpi! Indeed, Sam Altmanthe CEO of Openai, announced with great fanfare The native integration of the image generation Directly in Chatgpt via their GPT-4O multimodal model.

Translation for those who understand quickly when you give them a lot of time, you can now ask Chatgpt to create images directly in the conversation, without going through an external tool. Super practical to quickly iterate on an image of image.

What makes this update very interesting is that GPT-4O completely replaces Dall-E 3 as a default model of image generation and unlike his predecessor, he excels in creating images containing readable text. No more textual abuses!

The model takes a little more time to generate, but the quality is worth it & mldr; On the technical side, this evolution is the fruit of a whole year of work with a hundred “human trainers” who labeled the training data, pointing in particular the errors in the texts and the anatomical deformations (these famous 8 fingers that we know too well). This technique, called Reinforcement Learning from Human Feedback (RLHF), has thus made it possible to considerably refine the performance of the model.

Among the features that caught my eye, there is for example the possibility of creating images with transparent backgrounds (perfect for logos) & mldr;

SCR 20250325 SSNV

The use of hex codes for precise colors & mldr;

SCR 20250325 Suzu

And above all the ability to maintain visual consistency on several iterations. Like that if you design a character for a video game or a comic book, you can now gradually refine his appearance without losing the basic characteristics.

SCR 20250325 SYBR

SCR 20250325 SYMV

The model is also able to manage complex instructions with 10 to 20 different objects in the same image. That is to say that you can ask him to draw “A cyberpunk unicorn riding a flying pizza over a futuristic city with robots dancing the macarena and a DJ cat with turntablesAnd he will be doing pretty well. I tested. It’s creepy.

SCR 20250325

Another novelty that will have the most prudish among you, Openai has softened its content restrictions. Indeed, Sam Altman said that GPT-4O will be able to create “offensive” content to a “reasonable extent”, thus highlighting the “intellectual freedom” of users. We are still far from the permissiveness of Elon Musk’s Grok, but it is a step in this direction.

SCR 20250325 TDQP

Rest assured (or not), the guards remain in place for really problematic content such as child pornography or sexual deepfakes. Besides, the generation of this poster has been interrupted & mldr; Snif, we will never know what it represented.

On the accessibility side, this is where it becomes really interesting! The function is available for everyone, even with a free account! Plus, Pro and Team users have it immediately, as are free users. Only Enterprise and Education accounts will have to wait a little, while developers who would like to integrate it via the API will have to wait a few weeks. This OpenAi strategy is quite clever since they democratize access to their best tools to make up for their delay in competition.

Now, if you still have the nostalgia for Dall-E 3, you can always access it via a dedicated GPT but frankly, after testing GPT-4O, I don’t see why you would like to go back.

To try it, nothing could be simpler, connect to Chatgpt and ask him to create an image. Here are some prompts that I tested with good results:

Generates a logo for a tech blog called “Korben” with a transparent background. The logo must be minimalist with a baby with sunglasses containing matrix code inside.

SCR 20250325 TGZS

Created an infographic explaining the functioning of the RLHF in the AI, with readable text and a modern design on a dark blue background code hexa #556d8d

SCR 20250325 TMER

In short, I think I will use it in my job because to do a diagram or a small image to illustrate an article on software, it’s great!

So yes, I know, GPT-4O is not perfect. I noticed that it still struggles with the proportions and certain complex anatomical details. He also takes more time than Dall-E 3 to generate his images. But for a tool integrated directly into Chatgpt, it is an impressive leap forward. Openai finally catches up with Google Gemini who has already offered the generation of images since mid-2024.

Like that, no need to juggle between Chatgpt and Midjourney for your creative projects! Finally & mldr; Unless you are targeting a really realistic photo quality, in which case Midjourney still keeps one step ahead. But for everything else, GPT-4O seems very promising. Goodbye misshapen hands and illegible texts and hello in the hours spent generating images of astronauts cats eating pizzas on Mars. Do not thank me for your future loss of productivity.

To discover here: https://chat.openai.com

Source

Source link

Categorized in:

Technology

I tested the new generation of images with GPT-4O and it’s another world | Artificial intelligence

Comments

Leave a Reply Cancel reply

Previous Article

Gemini 2.5 – Google AI that takes its time before opening it | Artificial intelligence

Next Article

Towards a standardization of accents thanks / because of AI? | Artificial intelligence

XAN – An overpowered tool to manipulate your CSV files online command | Dev Tools

A prison for your smartphone? | Tech curiosities

Waymo – It’s a fact, autonomous cars are more gifted than humans | Artificial intelligence

Press ESC to close

Or check our Popular Categories...

Like what you read?

Subscribe to our Newsletter

Comments

Leave a Reply Cancel reply

Related Articles

Previous Article

Next Article