OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[feat] Integrate BrowserGym

Open frankxu2004 opened this issue 1 year ago • 12 comments

Currently the browser is implemented with raw playwright. Ideally we can reuse existing work BrowserGym to enable more interactions with a browser other than just goto url. This first PR aims at replacing the current browser action with BrowserGym integration. In the future we can easily add more actions supported by BrowserGym to enable agent interaction with the web.

In the proposed setup, we need to start a new backend server browser_api_server.py as an interface for all browser-related interactions. In the future, all browser-related actions will just be HTTP API calls to this server.

I would love suggestions from core maintainers on how to best handle this server's lifecycle with the main backend server.

Right now, to run it, run poetry run python opendevin/browser_server/browser_api_server.py along side with the main backend server. Maybe we could add this line to Makefile?

cc @neubig @xingyaoww

Fixes #1384

frankxu2004 avatar Apr 29 '24 15:04 frankxu2004

Awesome thanks a bunch @frankxu2004 ! We'll take a look.

neubig avatar Apr 29 '24 15:04 neubig

An example interaction:

==============
STEP 0

12:01:15 - PLAN
browse www.google.com
12:01:17 - ACTION
BrowseURLAction(url='www.google.com', action='browse')
12:01:18 - OBSERVATION
Error: Protocol error (Page.navigate): Cannot navigate to invalid URL
=========================== logs ===========================
navigating to "/projects/ogma3/fangzhex/OpenDevinwww.google.com", waiting until "load"
============================================================


==============
STEP 1

12:01:18 - PLAN
browse www.google.com
12:01:20 - ACTION
AgentThinkAction(thought='I need to double check the URL format before browsing again.', action='think')


==============
STEP 2

12:01:20 - PLAN
browse www.google.com
12:01:21 - ACTION
BrowseURLAction(url='https://www.google.com', action='browse')
12:01:24 - OBSERVATION
[ About ](https://about.google/?fg=1&utm_source=google-
US&utm_medium=referral&utm_campaign=hp-header) [ Store
](https://store.google.com/US?utm_source=hp_header&utm_medium=google_ooo&utm_campaign=GS100042&hl=en-
US)

[ Gmail ](https://mail.google.com/mail/&ogbl)

[ Images ](https://www.google.com/imghp?hl=en&ogbl)

[ ](https://www.google.com/intl/en/about/products)

[ Sign in
](https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/&ec=GAZAmgQ)

![Google](/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png)

Choose what you’re giving feedback on

* * * *

See more

Delete

* * Delete 

* Report inappropriate predictions 

I'm Feeling Curious

I'm Feeling Hungry

I'm Feeling Adventurous

I'm Feeling Playful

I'm Feeling Stellar

I'm Feeling Doodley

I'm Feeling Trendy

I'm Feeling Artistic

I'm Feeling Funny

[ Advertising ](https://www.google.com/intl/en_us/ads/?subid=ww-ww-et-g-awa-a-
g_hpafoot1_1!o2&utm_source=google.com&utm_medium=referral&utm_campaign=google_hpafooter&fg=1)
[ Business ](https://www.google.com/services/?subid=ww-ww-et-g-awa-a-
g_hpbfoot1_1!o2&utm_source=google.com&utm_medium=referral&utm_campaign=google_hpbfooter&fg=1)
[ How Search works ](https://google.com/search/howsearchworks/?fg=1)

[
![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABUAAAAYCAMAAAAiV0Z6AAAAPFBMVEVLoEN0wU6CzFKCzFKCzFKCzFKCzFJSo0MSczNDmkCCzFJPoUMTczNdr0gmgziCzFITczMTczMTczMTczPh00jOAAAAFHRSTlPF/+bIsms8Ad///hX+//5/tXw7aMEAx10AAACaSURBVHgBbc4HDoRQCATQ33tbvf9dF9QxaCT9UQaltLHOh/golXKhMs5Xqa0xU1lyoa2fXFyQOsDG38qsLy4TaV+sFislovyhPzLJJrBu6eQOtpW0LjbJkzTuTDLRVNKa3uxJI+VdiRqXSeu6GW+Qxi29eLIi8H7EsYrT42BD+mQtNO5JMjRuC4lSY8V4hsLX0egGijvUSEP9AbylEsOkeCgWAAAAAElFTkSuQmCC)
Our third decade of climate action: join us
](https://sustainability.google/?utm_source=googlehpfooter&utm_medium=housepromos&utm_campaign=bottom-
footer&utm_content=)

[ Privacy ](https://policies.google.com/privacy?hl=en&fg=1) [ Terms
](https://policies.google.com/terms?hl=en&fg=1)

Settings

[ Search settings ](https://www.google.com/preferences?hl=en&fg=1)

[ Advanced search ](/advanced_search?hl=en&fg=1)

[ Your data in Search
](/history/privacyadvisor/search/unauth?utm_source=googlemenu&fg=1&cctld=com)

[ Search history ](/history/optout?hl=en&fg=1)

[ Search help
](https://support.google.com/websearch/?p=ws_results_help&hl=en&fg=1)

Send feedback

Dark theme: Off

Google apps




==============
STEP 3

12:01:24 - PLAN
browse www.google.com
12:01:27 - ACTION
AgentFinishAction(outputs={}, action='finish')
12:01:27 - opendevin:INFO: agent_controller.py:226
INFO
AgentFinishAction(outputs={}, action='finish')
12:01:27 - opendevin:INFO: agent_controller.py:107 - Task finished by agent

frankxu2004 avatar Apr 29 '24 16:04 frankxu2004

Agreed--I'm worried about the extra overhead here to run a separate server. Also curious what's going on under the hood with gymnasium--is it running a docker container?

rbren avatar Apr 29 '24 18:04 rbren

Thanks for the comments!

@neubig

  1. Right now, the server would be run along side with the main backend on host system by poetry run python opendevin/browser_server/browser_api_server.py. It can be moved inside sandbox docker for better security but we need to figure out how to communicate between main backend and the sandboxed server.
  2. The main issue with this is that, in BrowserGym, Playwright is used with sync API, and as a result cannot be used inside an asyncio loop. The current main backend fastapi server is based on asyncio so putting browsergym instantiation code in the backend will result in playwright._impl._api_types.Error: It looks like you are using Playwright Sync API inside the asyncio loop.

Potential solution is to refactor BrowserGym using async playwright API, but as BrowserGym is quite complex this route might be requiring a lot of work. Another option is to explore ways of having a sync thread for BrowserGym, but I might need a bit more help.

@rbren

  1. going on under the hood with gymnasium--is it running a docker container? Underlying code for BrowserGym is just creating a Playwright sync_api context with Chromium, and maintain that context and expose high-level APIs for both useful observations for agents to ingest, and provide an execution platform where generated actions could execute on.

Hope it helps! Right now I use a single-threaded Flask server just for this sync vs async incompatibility.

frankxu2004 avatar Apr 29 '24 21:04 frankxu2004

Hey @frankxu2004 , could you try this? https://pypi.org/project/nest-asyncio/

neubig avatar Apr 30 '24 02:04 neubig

Hey @frankxu2004 , could you try this? https://pypi.org/project/nest-asyncio/

Thanks for the suggestion. I tried applying this patch in the main backend but it seems like the asyncio server we used uvicorn cannot be patched

ValueError: Can't patch loop of type <class 'uvloop.Loop'>

From the package readme: Only event loops from asyncio can be patched; Loops from other projects, such as uvloop or quamash, generally can’t be patched.

Right now I am investigating replacing the separate Flask server with multiprocessing.Process and multiprocessing.Queue, hopefully eliminating the need for HTTP server. This way we can more easily handle the BrowserEnv's life cycle with AgentController

frankxu2004 avatar Apr 30 '24 03:04 frankxu2004

I take a look and it seems it is challenging to make async & sync live well with each other. Starting the browser in a separate thread/process and communicate with it via Queue could work so that we don't need to start this as a separate server. Look forward to the implementation though - pls let me know if you need any help!!

xingyaoww avatar Apr 30 '24 03:04 xingyaoww

Updated the implementation cc @neubig @rbren @xingyaoww

No flask server needed. A process is created when AgentController is initialized. Question: how to handle graceful shutdown of the Process created? I am not super familiar with the codebase, but it seems that atexit is not used for AgentController.

frankxu2004 avatar Apr 30 '24 06:04 frankxu2004

Thanks a lot for all your comments, I have fixed them an please review again. Thanks!

Major update:

  1. Better handling of BrowserEnv and abstraction of step method for end-users
  2. Updated BrowserObservation to include all possible information for agents to use, however, currently keeping default to text only. Removed some huge observation types such as DOMs, Accessibility Trees from memory to save tokens, but available for future agent use.

frankxu2004 avatar May 01 '24 09:05 frankxu2004

Updated html and text handling. Default to using text as content for this PR, to maintain the same observation as previous versions

frankxu2004 avatar May 01 '24 13:05 frankxu2004

Codecov Report

Attention: Patch coverage is 53.06122% with 46 lines in your changes are missing coverage. Please review.

:exclamation: No coverage uploaded for pull request base (main@0d77f49). Click here to learn what that means.

Files Patch % Lines
opendevin/browser/browser_env.py 43.42% 43 Missing :warning:
opendevin/action/browse.py 0.00% 3 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1452   +/-   ##
=======================================
  Coverage        ?   60.88%           
=======================================
  Files           ?       85           
  Lines           ?     3710           
  Branches        ?        0           
=======================================
  Hits            ?     2259           
  Misses          ?     1451           
  Partials        ?        0           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 01 '24 13:05 codecov-commenter

It seems like it's failing on MacOS but passing on Linux, but interestingly failing at the sandbox connection part. Any ideas why this might be? cc @xingyaoww https://github.com/OpenDevin/OpenDevin/actions/runs/8909840159/job/24468062273

frankxu2004 avatar May 01 '24 13:05 frankxu2004

Got this weird error while running it on the frontend (make run) with MonologueAgent - any idea what might cause it?

image

xingyaoww avatar May 01 '24 15:05 xingyaoww