You are tired of clicking yourself on your computer as a normal human being ? Is this feeling of having to move your mouse slowly and tap on your keyboard, while waiting for retirement you seem so 2024?
So I have good news for you!
Hugging Face has just come out Open Computer Agentan open source virtual robot that can use your PC for you while you are quietly sipping your coffee while looking at the machine do the job. Available for free since yesterday, this tool is a clear and clear response from Open Source to Operator in Openai.
Because yes, we are there & mldr; Even our laziness now has its AI.
But then what is this agent again? Well it is clearly a somewhat slow but determined virtual trainee, capable of using a Linux computer as you would. To operate, Open Computer Agent uses a virtual machine hosted in the Hugging Face Cloud, equipped with Firefox and other applications. You give him an instruction in natural language, and he will execute it as a human, that is to say, open applications, navigate the web, click on buttons, fill forms & mldr;
Behind this interface hides a fairly impressive technology since the agent is based on the Qwen-VL vision models which have a native “grinding” capacity (basically, they can locate any element in an image by its coordinates) and it is this capacity which allows the agent to “see” the screen and to know where to click, as if a human was looking at the interface.
To start, go to https://huggingface.co/spaces/smolagents/computer-agent. You will then see a minimalist interface, namely a field to enter your instruction, a button “Let’s go!” And a window that displays the virtual computer that the agent will use.
Once the request is launched, you will probably be placed in a virtual queue which, depending on the time, can take from a few seconds to several minutes. And once your turn comes, you will see the mouse cursor move and AI use this virtual computer.
First test, simple: “Find me pictures of Manuel Dorne (Korben)”. I click on Let’s Go and & Mldr; Magic! The agent Opens Firefox, goes on Google, gets the search, click on images and starts browsing the results. Well, it puts about 45 seconds to do what you would do in 10, but it is still fascinating enough to see this virtual robot manipulate an interface thought for humans.
Let’s try something more complex. “” “Use Google Maps To Find the Cathedral of Clermont-Ferrand”. This time, the agent sails to Google Maps, Tape” Cathedral Clermont-Ferrand “in the search bar, and indeed find the place. Not bad at all!
It’s funny but clearly unusable on a daily basis because the agent is slow. Sorry, very very very slow. Each action takes several long seconds or even minutes, it is almost unusable. And woe to you if a Captcha appears because the agent will then be completely lost in the face of these tests designed precisely to distinguish humans from robots. In this case, I advise you to interrupt the agent and resolve the Captcha yourself, which breaks the charm of the thing a little.
I also tried more complex tasks, such as looking for flights but it was a total failure. The agent got lost in the drop -down menus and calendars. He ended up giving up after clicking random for two minutes. Other times too, the requests were so long that the virtual computer is putting itself in standby or loses its connection.
So if you want to test, here are some tips:
– Be precise in your instructions
– Start with simple tasks
– Be patient (very patient)
– In case of blocking, use the “Stop the Agent” button at the bottom of the interface and recharge the page.
Beyond these limitations, what is really interesting here is what this tool represents. While Optai is doing the beautiful with its owner Operator agent, Hugging Face shows that the open source community is not to be outdone. It is the democratized version of a technology that could ultimately change our daily way to interact with our computers.
Imagine the potential a bit & mldr; Today, the agent can carry out basic research on the web. But tomorrow, with improvements, it could automate repetitive tasks such as filling administrative forms, monitoring sites to alert you changes, or even do your shopping online while you sleep or float in your jacuzzi. The beautiful life!
For people with reduced mobility, this type of technology could even represent a nice professional in digital accessibility and for Dev, it is also a good playground to explore the possibilities of agentic AI without depending on proprios solutions.
In short, there is still a long way to go because the agent must become faster, more reliable, must be able to resolve Captcha, and above all, to understand more complex instructions. But it is precisely because it is open source that these improvements could happen quickly, carried as always, by a community of enthusiastic developers.
So if you are curious to see a virtual robot galley with Firefox exactly like your grandfather, go for the tool! He’s free but be patient!
Source link
Subscribe to our email newsletter to get the latest posts delivered right to your email.
Comments