agent-zero Browser management.

The full potential of Agent-Zero would be possible if it were feasible to integrate Google browser management in a visual way, like asking to open YouTube and having it do so. Let's say you want to automate interaction with a website instead of creating a full bot, doing it with Agent Zero running automatically by giving only the prompts? It should be able to read or snapshot the website to understand what to do there?

Right now, it's not possible.

Oct 02 '24 11:10 divol89

It would be great! As I understand this requires extra tooling let's say Browser wich can navigate/analize/input webpage. I have a limited experience with such systems. Playwright IMO is a good popular choice for managing browser in headless mode.

At every loop step such tool maight create a pdf screenshot and give it as input for a multimodal model to recognize.

Oct 03 '24 10:10 alexey2baranov

It would be great! As I understand this requires extra tooling let's say Browser wich can navigate/analize/input webpage. I have a limited experience with such systems. Playwright IMO is a good popular choice for managing browser in headless mode.

At every loop step such tool maight create a pdf screenshot and give it as input for a multimodal model to recognize.

yes exactly , imagine configuration for your agent-zero on a trading website platform running it on automatic with out strees insane .

Oct 03 '24 11:10 divol89

I have mentioned this in the discord a couple weeks or so ago-- basically some oss agent-q type functionality or possibly an integration of agent-q which could be called like a tool. It's on my todo list to work on. I've just been super busy. But I will work on something like this when I get a chance.

Oct 05 '24 10:10 TerminallyLazy

I have mentioned this in the discord a couple weeks or so ago-- basically some oss agent-q type functionality or possibly an integration of agent-q which could be called like a tool. It's on my todo list to work on. I've just been super busy. But I will work on something like this when I get a chance.

great , will be waiting to collaborate if its possible .

Oct 05 '24 12:10 divol89

please add this!

Oct 20 '24 02:10 MaximPro

Tthis would be great! +1 👍

Oct 20 '24 07:10 Hielkio

Can one imagine that Agent-Zero is piloting other frameworks, for example to use openinterpreter or anthropic computer use, to achieve this?

This would mean that the docker machine needs to be a full "computer" with the ability to launch a browser.

Dec 19 '24 10:12 idealley

idealley: open interpreter is fragged. the 2 i'm looking to try with zero are https://github.com/browser-use/browser-use with https://github.com/open-webui/open-webui . tried to get in on runner h, but they left me hanging. if your looking to do a computer use like open interpreter, you may want to look into https://github.com/bytedance/UI-TARS still trying to get a bios update for my other system so i can try dual 3060's. and both these models airgapped dual boot win/lin.

Feb 08 '25 07:02 FatherfoxStrongpaw

browser-use is already implemented into A0, I'm cooperating with the devs of browser-use on future improvements

Feb 08 '25 07:02 frdel

frdel: thats awesome. i tried the browser use in zero but i thought it waws another framework since the view was small, i couldn't recognize anything in it and it was having problems dealing with java, bot checks etc. unlike some others that brought up my browser and let me interact/intervene. i'll have to dig into it deeper and see where it's having issues.

do i have to have chrome installed and set up as my default browser, or does zero strictly deal with it's own copy of chrome? can it work with other browsers like Firefox and brave? I'm doing a deep dive into the docs today.

Feb 08 '25 16:02 FatherfoxStrongpaw

agent-zero agent-zero copied to clipboard

Browser management.

agent-zero
agent-zero copied to clipboard