agent-zero
agent-zero copied to clipboard
Browser management.
The full potential of Agent-Zero would be possible if it were feasible to integrate Google browser management in a visual way, like asking to open YouTube and having it do so. Let's say you want to automate interaction with a website instead of creating a full bot, doing it with Agent Zero running automatically by giving only the prompts? It should be able to read or snapshot the website to understand what to do there?
Right now, it's not possible.
It would be great!
As I understand this requires extra tooling let's say Browser wich can navigate/analize/input webpage.
I have a limited experience with such systems. Playwright IMO is a good popular choice for managing browser in headless mode.
At every loop step such tool maight create a pdf screenshot and give it as input for a multimodal model to recognize.
It would be great! As I understand this requires extra tooling let's say
Browserwich can navigate/analize/input webpage. I have a limited experience with such systems. Playwright IMO is a good popular choice for managing browser in headless mode.At every loop step such tool maight create a pdf screenshot and give it as input for a multimodal model to recognize.
yes exactly , imagine configuration for your agent-zero on a trading website platform running it on automatic with out strees insane .
I have mentioned this in the discord a couple weeks or so ago-- basically some oss agent-q type functionality or possibly an integration of agent-q which could be called like a tool. It's on my todo list to work on. I've just been super busy. But I will work on something like this when I get a chance.
I have mentioned this in the discord a couple weeks or so ago-- basically some oss agent-q type functionality or possibly an integration of agent-q which could be called like a tool. It's on my todo list to work on. I've just been super busy. But I will work on something like this when I get a chance.
great , will be waiting to collaborate if its possible .
please add this!
Tthis would be great! +1 👍
Can one imagine that Agent-Zero is piloting other frameworks, for example to use openinterpreter or anthropic computer use, to achieve this?
This would mean that the docker machine needs to be a full "computer" with the ability to launch a browser.
idealley: open interpreter is fragged. the 2 i'm looking to try with zero are https://github.com/browser-use/browser-use with https://github.com/open-webui/open-webui . tried to get in on runner h, but they left me hanging. if your looking to do a computer use like open interpreter, you may want to look into https://github.com/bytedance/UI-TARS still trying to get a bios update for my other system so i can try dual 3060's. and both these models airgapped dual boot win/lin.
browser-use is already implemented into A0, I'm cooperating with the devs of browser-use on future improvements
frdel: thats awesome. i tried the browser use in zero but i thought it waws another framework since the view was small, i couldn't recognize anything in it and it was having problems dealing with java, bot checks etc. unlike some others that brought up my browser and let me interact/intervene. i'll have to dig into it deeper and see where it's having issues.
do i have to have chrome installed and set up as my default browser, or does zero strictly deal with it's own copy of chrome? can it work with other browsers like Firefox and brave? I'm doing a deep dive into the docs today.