BrowserGym icon indicating copy to clipboard operation
BrowserGym copied to clipboard

Fails to load certain start_urls in headless mode

Open korbinian-hoermann opened this issue 9 months ago • 2 comments

Hi! I want to collect trajectories of a llm in the internet. As I am using a cluster for this, i use the headless=True flag.

I initialize my env as:

env = gym.make(
    "browsergym/openended",
    task_kwargs={"start_url": "https://www.reddit.com"},
    wait_for_user_message=False,
    headless=True,
    viewport={"width": viewport_width, "height": viewport_height},
    timeout=20000,
    action_mapping=agent.action_set.to_python_code,
)

While this works for the start_url "https://www.google.com", it fails for reddit or amazon. When i run the same code locally, with headless=False, it works for all 3 of them.

Is there a specific reason for it or a fix?

This is the output for reddit:

  File "/home/hpc/b232dd/b232dd14/browsergym_playground.py", line 62, in <module>
    obs, info = env.reset()
                ^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/wrappers/common.py", line 400, in reset
    return super().reset(seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/core.py", line 328, in reset
    return self.env.reset(seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/wrappers/common.py", line 293, in reset
    return env_reset_passive_checker(self.env, seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/utils/passive_env_checker.py", line 185, in env_reset_passive_checker
    result = env.reset(**kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/browsergym/core/env.py", line 303, in reset
    task_goal, task_info = self.task.setup(page=self.page)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/browsergym/core/task.py", line 95, in setup
    page.goto(self.start_url, timeout=10000)
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 9006, in goto
    self._sync(
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 115, in _sync
    return task.result()
           ^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_page.py", line 551, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 145, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 10000ms exceeded.
Call log:
  - navigating to "https://www.reddit.com/", waiting until "load"

korbinian-hoermann avatar Jan 30 '25 15:01 korbinian-hoermann