I am not a great user of AI Studio de Google, and even less of their open models, even if Gemma 3 was pretty good, I found. Google’s IA ecosystem rarely made me vibrate more than that so far.

But there I could not ignore Gemini 2.5.

For what ? Well because they announce something that stands out a bit: A model which certainly takes its time but which thinks really well before responding.

The idea behind is what they call it “Ai Reasoning”. Basically, instead of drawing an answer at the speed of lightning, even if it means planting itself, Gemini 2.5 takes a virtual coffee break to analyze more in depth, check your information, in short, reasoning…

It uses more time and calculation, so it is potentially slower and more expensive, but Google promises more reliable results, especially for complex stuff such as math or code. It is an interesting approach that changes from the usual race to minimum latency.

IMG 9903 1

With this announcement, We are therefore on a model that follows the footsteps of O1 and O3 models from Openai but also Deepseek R1 and Anthropic with the latest version of Claude. It is therefore clearly a fundamental trend that could be the basis of the famous autonomous “IA agents” of tomorrow.

For the moment, This Google model is experimental and can process different types of information … text, images, etc. Google has not detailed all the “etc.”, but we are on classic multimodal. Where it becomes strong is the size of the context window: 1 million tokens at the launch (around 750,000 words, more than the lord of the rings complete!), And they are already talking about going up 2 million tokens Soon. Enough to give it entire code bases or kilometers of documentation to analyze. Google also says that it is particularly good at creating visual web apps and for “agentic coding”.

So, is it verified on benchmarks ? Well Mountain View comes a few figures. On Help Polyglot (code edition), it is doing well (68.6%), beating the competitors mentioned. On the other hand, on Swe-Bench Verified (software development), it is more mixed: with 63.8%, it exceeds O3-Mini and Deepseek R1, but it remains behind the Claude 3.7 SONNETanthropic (70.3%).

Like what, always be wary of triumphant announcements. On another multimodal test (Humanity’s Last Exam), it obtains 18.8%, which would be “better than most” of other large models. In short, it is promising on certain points, but not (yet?) A revolution everywhere.

Personally, Look forward to testing this with code too. This is often where we really see what AI has in the belly, especially seen as they insist on its capacities of “agentic coding”. Seeing how he manages to analyze, correct or even write complex code, it will be interesting.

Here, for my part I will continue the tests because it is fresh. I could not help but write this article by having this new version accompanied. I found it pretty good even if I still prefer Claude Sonnet 3.7 which captures my delirium better.

To test it for yourself, It is via the Google developer platform, Ai Studioor for subscribers Gemini Advanced (the paid offer at $ 20/month). Be careful, as said above, the “reasoning” is more expensive in resources, and Google has not yet announced the price of theAPI. It may prick a little for those who would like to integrate it into their projects.

To see how it evolves …

Source


Source link

Categorized in: