BrowserGym
BrowserGym copied to clipboard
Fails to load certain start_urls in headless mode
Hi! I want to collect trajectories of a llm in the internet. As I am using a cluster for this, i use the headless=True flag.
I initialize my env as:
env = gym.make(
"browsergym/openended",
task_kwargs={"start_url": "https://www.reddit.com"},
wait_for_user_message=False,
headless=True,
viewport={"width": viewport_width, "height": viewport_height},
timeout=20000,
action_mapping=agent.action_set.to_python_code,
)
While this works for the start_url "https://www.google.com", it fails for reddit or amazon. When i run the same code locally, with headless=False, it works for all 3 of them.
Is there a specific reason for it or a fix?
This is the output for reddit:
File "/home/hpc/b232dd/b232dd14/browsergym_playground.py", line 62, in <module>
obs, info = env.reset()
^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/wrappers/common.py", line 400, in reset
return super().reset(seed=seed, options=options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/core.py", line 328, in reset
return self.env.reset(seed=seed, options=options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/wrappers/common.py", line 293, in reset
return env_reset_passive_checker(self.env, seed=seed, options=options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/utils/passive_env_checker.py", line 185, in env_reset_passive_checker
result = env.reset(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/browsergym/core/env.py", line 303, in reset
task_goal, task_info = self.task.setup(page=self.page)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/browsergym/core/task.py", line 95, in setup
page.goto(self.start_url, timeout=10000)
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 9006, in goto
self._sync(
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 115, in _sync
return task.result()
^^^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_page.py", line 551, in goto
return await self._main_frame.goto(**locals_to_params(locals()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 145, in goto
await self._channel.send("goto", locals_to_params(locals()))
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
return await self._connection.wrap_api_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call
raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 10000ms exceeded.
Call log:
- navigating to "https://www.reddit.com/", waiting until "load"