[feat] Integrate BrowserGym
Currently the browser is implemented with raw playwright. Ideally we can reuse existing work BrowserGym to enable more interactions with a browser other than just goto url. This first PR aims at replacing the current browser action with BrowserGym integration. In the future we can easily add more actions supported by BrowserGym to enable agent interaction with the web.
In the proposed setup, we need to start a new backend server browser_api_server.py as an interface for all browser-related interactions. In the future, all browser-related actions will just be HTTP API calls to this server.
I would love suggestions from core maintainers on how to best handle this server's lifecycle with the main backend server.
Right now, to run it, run poetry run python opendevin/browser_server/browser_api_server.py along side with the main backend server. Maybe we could add this line to Makefile?
cc @neubig @xingyaoww
Fixes #1384
Awesome thanks a bunch @frankxu2004 ! We'll take a look.
An example interaction:
==============
STEP 0
12:01:15 - PLAN
browse www.google.com
12:01:17 - ACTION
BrowseURLAction(url='www.google.com', action='browse')
12:01:18 - OBSERVATION
Error: Protocol error (Page.navigate): Cannot navigate to invalid URL
=========================== logs ===========================
navigating to "/projects/ogma3/fangzhex/OpenDevinwww.google.com", waiting until "load"
============================================================
==============
STEP 1
12:01:18 - PLAN
browse www.google.com
12:01:20 - ACTION
AgentThinkAction(thought='I need to double check the URL format before browsing again.', action='think')
==============
STEP 2
12:01:20 - PLAN
browse www.google.com
12:01:21 - ACTION
BrowseURLAction(url='https://www.google.com', action='browse')
12:01:24 - OBSERVATION
[ About ](https://about.google/?fg=1&utm_source=google-
US&utm_medium=referral&utm_campaign=hp-header) [ Store
](https://store.google.com/US?utm_source=hp_header&utm_medium=google_ooo&utm_campaign=GS100042&hl=en-
US)
[ Gmail ](https://mail.google.com/mail/&ogbl)
[ Images ](https://www.google.com/imghp?hl=en&ogbl)
[ ](https://www.google.com/intl/en/about/products)
[ Sign in
](https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/&ec=GAZAmgQ)

Choose what you’re giving feedback on
* * * *
See more
Delete
* * Delete
* Report inappropriate predictions
I'm Feeling Curious
I'm Feeling Hungry
I'm Feeling Adventurous
I'm Feeling Playful
I'm Feeling Stellar
I'm Feeling Doodley
I'm Feeling Trendy
I'm Feeling Artistic
I'm Feeling Funny
[ Advertising ](https://www.google.com/intl/en_us/ads/?subid=ww-ww-et-g-awa-a-
g_hpafoot1_1!o2&utm_source=google.com&utm_medium=referral&utm_campaign=google_hpafooter&fg=1)
[ Business ](https://www.google.com/services/?subid=ww-ww-et-g-awa-a-
g_hpbfoot1_1!o2&utm_source=google.com&utm_medium=referral&utm_campaign=google_hpbfooter&fg=1)
[ How Search works ](https://google.com/search/howsearchworks/?fg=1)
[

Our third decade of climate action: join us
](https://sustainability.google/?utm_source=googlehpfooter&utm_medium=housepromos&utm_campaign=bottom-
footer&utm_content=)
[ Privacy ](https://policies.google.com/privacy?hl=en&fg=1) [ Terms
](https://policies.google.com/terms?hl=en&fg=1)
Settings
[ Search settings ](https://www.google.com/preferences?hl=en&fg=1)
[ Advanced search ](/advanced_search?hl=en&fg=1)
[ Your data in Search
](/history/privacyadvisor/search/unauth?utm_source=googlemenu&fg=1&cctld=com)
[ Search history ](/history/optout?hl=en&fg=1)
[ Search help
](https://support.google.com/websearch/?p=ws_results_help&hl=en&fg=1)
Send feedback
Dark theme: Off
Google apps
==============
STEP 3
12:01:24 - PLAN
browse www.google.com
12:01:27 - ACTION
AgentFinishAction(outputs={}, action='finish')
12:01:27 - opendevin:INFO: agent_controller.py:226
INFO
AgentFinishAction(outputs={}, action='finish')
12:01:27 - opendevin:INFO: agent_controller.py:107 - Task finished by agent
Agreed--I'm worried about the extra overhead here to run a separate server. Also curious what's going on under the hood with gymnasium--is it running a docker container?
Thanks for the comments!
@neubig
- Right now, the server would be run along side with the main backend on host system by
poetry run python opendevin/browser_server/browser_api_server.py. It can be moved insidesandboxdocker for better security but we need to figure out how to communicate between main backend and the sandboxed server. - The main issue with this is that, in BrowserGym, Playwright is used with
syncAPI, and as a result cannot be used inside an asyncio loop. The current main backendfastapiserver is based on asyncio so putting browsergym instantiation code in the backend will result inplaywright._impl._api_types.Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Potential solution is to refactor BrowserGym using async playwright API, but as BrowserGym is quite complex this route might be requiring a lot of work. Another option is to explore ways of having a sync thread for BrowserGym, but I might need a bit more help.
@rbren
- going on under the hood with gymnasium--is it running a docker container?
Underlying code for BrowserGym is just creating a Playwright
sync_apicontext with Chromium, and maintain that context and expose high-level APIs for both useful observations for agents to ingest, and provide an execution platform where generated actions could execute on.
Hope it helps! Right now I use a single-threaded Flask server just for this sync vs async incompatibility.
Hey @frankxu2004 , could you try this? https://pypi.org/project/nest-asyncio/
Hey @frankxu2004 , could you try this? https://pypi.org/project/nest-asyncio/
Thanks for the suggestion. I tried applying this patch in the main backend but it seems like the asyncio server we used uvicorn cannot be patched
ValueError: Can't patch loop of type <class 'uvloop.Loop'>
From the package readme: Only event loops from asyncio can be patched; Loops from other projects, such as uvloop or quamash, generally can’t be patched.
Right now I am investigating replacing the separate Flask server with multiprocessing.Process and multiprocessing.Queue, hopefully eliminating the need for HTTP server. This way we can more easily handle the BrowserEnv's life cycle with AgentController
I take a look and it seems it is challenging to make async & sync live well with each other. Starting the browser in a separate thread/process and communicate with it via Queue could work so that we don't need to start this as a separate server. Look forward to the implementation though - pls let me know if you need any help!!
Updated the implementation cc @neubig @rbren @xingyaoww
No flask server needed. A process is created when AgentController is initialized. Question: how to handle graceful shutdown of the Process created? I am not super familiar with the codebase, but it seems that atexit is not used for AgentController.
Thanks a lot for all your comments, I have fixed them an please review again. Thanks!
Major update:
- Better handling of BrowserEnv and abstraction of
stepmethod for end-users - Updated BrowserObservation to include all possible information for agents to use, however, currently keeping default to text only. Removed some huge observation types such as DOMs, Accessibility Trees from memory to save tokens, but available for future agent use.
Updated html and text handling. Default to using text as content for this PR, to maintain the same observation as previous versions
Codecov Report
Attention: Patch coverage is 53.06122% with 46 lines in your changes are missing coverage. Please review.
:exclamation: No coverage uploaded for pull request base (
main@0d77f49). Click here to learn what that means.
| Files | Patch % | Lines |
|---|---|---|
| opendevin/browser/browser_env.py | 43.42% | 43 Missing :warning: |
| opendevin/action/browse.py | 0.00% | 3 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## main #1452 +/- ##
=======================================
Coverage ? 60.88%
=======================================
Files ? 85
Lines ? 3710
Branches ? 0
=======================================
Hits ? 2259
Misses ? 1451
Partials ? 0
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
It seems like it's failing on MacOS but passing on Linux, but interestingly failing at the sandbox connection part. Any ideas why this might be? cc @xingyaoww https://github.com/OpenDevin/OpenDevin/actions/runs/8909840159/job/24468062273
Got this weird error while running it on the frontend (make run) with MonologueAgent - any idea what might cause it?