BrowserGym icon indicating copy to clipboard operation
BrowserGym copied to clipboard

task hanging in VWA with vision agent

Open recursix opened this issue 1 year ago • 4 comments

It's weird that the non-vision agent had much less error than the vision based agent.

INFO:root:All jobs are finished. Calling agent_args.close() on all agents...
INFO:root:Experiment finished.
Searching experiments directories.: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1266/1266 [00:00<00:00, 105094.19it/s]
Loading results: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 910/910 [00:00<00:00, 2803.40it/s]
INFO:agentlab.experiments.study:
                                                    avg_reward  std_err  avg_steps n_completed  n_err  cum_cost
agent.agent_name                    env.benchmark
GenericAgent-gpt-4o-mini-2024-07-18 visualwebarena       0.146    0.012     10.081     910/910    134   75.9531
INFO:root:Found 134 incomplete experiments in /home/toolkit/agentlab_results/2024-11-18_02-26-48_2-agents-on-2-benchmarks/2024-11-18_09-09-04_genericagent-gpt-4o-mini-2024-07-18-on-visualwebarena.
INFO:root:Make sure the processes that were running are all stopped. Otherwise,
WARNING:agentlab.experiments.study:Study genericagent-gpt-4o-mini-2024-07-18-on-visualwebarena did not finish after 3 trials. There are 134 incomplete experiments.
Searching experiments directories.: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2401/2401 [00:00<00:00, 148941.40it/s]
Loading results: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1820/1820 [00:00<00:00, 44543.97it/s]
INFO:agentlab.experiments.study:
                                                                                   avg_reward  std_err  avg_steps n_completed  n_err  cum_cost
agent.agent_name                    env.benchmark  agent.flags.obs.use_screenshot
GenericAgent-gpt-4o-mini-2024-07-18 visualwebarena False                                0.141    0.012     12.130     910/910      3   27.5141
GenericAgent-gpt-4o-mini-2024-07-18 visualwebarena True                                 0.146    0.012     10.081     910/910    134   75.9531

133x : Exception uncaught by agent or environment in task <task_name>.
TimeoutError:
Timeout 10000ms exceeded.
========================

  • visualwebarena.100 seed: 25
  • visualwebarena.101 seed: 12
  • visualwebarena.102 seed: 31
  • visualwebarena.103 seed: 31
  • visualwebarena.104 seed: 3
  • visualwebarena.105 seed: 29
  • visualwebarena.106 seed: 22
  • visualwebarena.107 seed: 14
  • visualwebarena.108 seed: 28
  • visualwebarena.109 seed: 12
  • visualwebarena.110 seed: 31
  • visualwebarena.111 seed: 6
  • visualwebarena.112 seed: 21
  • visualwebarena.113 seed: 27
  • visualwebarena.114 seed: 1
  • visualwebarena.115 seed: 5
  • visualwebarena.116 seed: 27
  • visualwebarena.117 seed: 27
  • visualwebarena.119 seed: 29
  • visualwebarena.120 seed: 10
  • visualwebarena.121 seed: 27
  • visualwebarena.122 seed: 24
  • visualwebarena.123 seed: 32
  • visualwebarena.124 seed: 0
  • visualwebarena.125 seed: 26
  • visualwebarena.126 seed: 12
  • visualwebarena.127 seed: 2
  • visualwebarena.128 seed: 5
  • visualwebarena.129 seed: 7
  • visualwebarena.130 seed: 26
  • visualwebarena.131 seed: 8
  • visualwebarena.132 seed: 32
  • visualwebarena.133 seed: 23
  • visualwebarena.134 seed: 14
  • visualwebarena.135 seed: 31
  • visualwebarena.136 seed: 31
  • visualwebarena.137 seed: 23
  • visualwebarena.138 seed: 11
  • visualwebarena.139 seed: 1
  • visualwebarena.140 seed: 2
  • visualwebarena.141 seed: 16
  • visualwebarena.142 seed: 1
  • visualwebarena.146 seed: 31
  • visualwebarena.147 seed: 32
  • visualwebarena.148 seed: 0
  • visualwebarena.149 seed: 18
  • visualwebarena.150 seed: 1
  • visualwebarena.151 seed: 25
  • visualwebarena.152 seed: 31
  • visualwebarena.153 seed: 5
  • visualwebarena.154 seed: 31
  • visualwebarena.156 seed: 10
  • visualwebarena.157 seed: 16
  • visualwebarena.158 seed: 23
  • visualwebarena.161 seed: 5
  • visualwebarena.162 seed: 21
  • visualwebarena.163 seed: 10
  • visualwebarena.164 seed: 15
  • visualwebarena.165 seed: 32
  • visualwebarena.166 seed: 8
  • visualwebarena.167 seed: 5
  • visualwebarena.168 seed: 15
  • visualwebarena.169 seed: 28
  • visualwebarena.170 seed: 2
  • visualwebarena.171 seed: 19
  • visualwebarena.172 seed: 18
  • visualwebarena.173 seed: 25
  • visualwebarena.174 seed: 2
  • visualwebarena.175 seed: 18
  • visualwebarena.176 seed: 19
  • visualwebarena.177 seed: 31
  • visualwebarena.178 seed: 6
  • visualwebarena.179 seed: 32
  • visualwebarena.180 seed: 17
  • visualwebarena.181 seed: 0
  • visualwebarena.182 seed: 10
  • visualwebarena.183 seed: 27
  • visualwebarena.184 seed: 24
  • visualwebarena.185 seed: 22
  • visualwebarena.186 seed: 30
  • visualwebarena.187 seed: 29
  • visualwebarena.188 seed: 6
  • visualwebarena.189 seed: 15
  • visualwebarena.190 seed: 25
  • visualwebarena.191 seed: 1
  • visualwebarena.192 seed: 0
  • visualwebarena.193 seed: 11
  • visualwebarena.194 seed: 4
  • visualwebarena.195 seed: 31
  • visualwebarena.196 seed: 8
  • visualwebarena.197 seed: 18
  • visualwebarena.198 seed: 15
  • visualwebarena.199 seed: 2
  • visualwebarena.200 seed: 19
  • visualwebarena.201 seed: 23
  • visualwebarena.202 seed: 32
  • visualwebarena.204 seed: 10
  • visualwebarena.206 seed: 19
  • visualwebarena.207 seed: 24
  • visualwebarena.209 seed: 28
  • visualwebarena.210 seed: 17
  • visualwebarena.211 seed: 17
  • visualwebarena.212 seed: 1
  • visualwebarena.214 seed: 32
  • visualwebarena.215 seed: 3
  • visualwebarena.216 seed: 32
  • visualwebarena.218 seed: 20
  • visualwebarena.220 seed: 7
  • visualwebarena.221 seed: 6
  • visualwebarena.224 seed: 32
  • visualwebarena.225 seed: 11
  • visualwebarena.226 seed: 21
  • visualwebarena.227 seed: 21
  • visualwebarena.228 seed: 29
  • visualwebarena.229 seed: 7
  • visualwebarena.230 seed: 26
  • visualwebarena.231 seed: 26
  • visualwebarena.232 seed: 33
  • visualwebarena.233 seed: 20
  • visualwebarena.305 seed: 18
  • visualwebarena.305 seed: 18
  • visualwebarena.88 seed: 26
  • visualwebarena.89 seed: 0
  • visualwebarena.90 seed: 13
  • visualwebarena.91 seed: 2
  • visualwebarena.92 seed: 0
  • visualwebarena.93 seed: 4
  • visualwebarena.94 seed: 25
  • visualwebarena.95 seed: 13
  • visualwebarena.96 seed: 26
  • visualwebarena.97 seed: 8
  • visualwebarena.98 seed: 14
  • visualwebarena.99 seed: 14

Showing Max 3 stack traces:

2024-11-18 15:45:43,101 - 2060124 - browsergym.experiments.loop - WARNING - Exception uncaught by agent or environment in task visualwebarena.100.
TimeoutError:
Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
  locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
  waiting for element to be visible, enabled and stable
  element is visible, enabled and stable
  scrolling into view if needed
...
...truncated middle of the log
...
  File "/home/toolkit/dev/BrowserGym/browsergym/experiments/src/browsergym/experiments/loop.py", line 437, in from_reset
    self.obs, env_info = env.reset(seed=seed)
                         ^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/time_limit.py", line 75, in reset
    return self.env.reset(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset
    return self.env.reset(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/dev/BrowserGym/browsergym/core/src/browsergym/core/env.py", line 303, in reset
    task_goal, task_info = self.task.setup(page=self.page)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/task.py", line 218, in setup
    self.webarena_instance.ui_login(site=site, page=page)
  File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/instance.py", line 97, in ui_login
    page.get_by_role("button", name="Log in").click()
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 15749, in click
    self._sync(
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 109, in _sync
    return task.result()
           ^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_locator.py", line 159, in click
    return await self._frame.click(self._selector, strict=True, **params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 483, in click
    await self._channel.send("click", locals_to_params(locals()))
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 490, in wrap_api_call
    return await cb()
           ^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 99, in inner_send
    result = next(iter(done)).result()
             ^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._api_types.TimeoutError: Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
  locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
  waiting for element to be visible, enabled and stable
  element is visible, enabled and stable
  scrolling into view if needed
  done scrolling
  performing click action
  click action done
  waiting for scheduled navigations to finish
============================================================


2024-11-18 15:45:44,621 - 2060127 - browsergym.experiments.loop - WARNING - Exception uncaught by agent or environment in task visualwebarena.101.
TimeoutError:
Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
  locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
  waiting for element to be visible, enabled and stable
  element is visible, enabled and stable
  scrolling into view if needed
...
...truncated middle of the log
...
  File "/home/toolkit/dev/BrowserGym/browsergym/experiments/src/browsergym/experiments/loop.py", line 437, in from_reset
    self.obs, env_info = env.reset(seed=seed)
                         ^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/time_limit.py", line 75, in reset
    return self.env.reset(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset
    return self.env.reset(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/dev/BrowserGym/browsergym/core/src/browsergym/core/env.py", line 303, in reset
    task_goal, task_info = self.task.setup(page=self.page)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/task.py", line 218, in setup
    self.webarena_instance.ui_login(site=site, page=page)
  File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/instance.py", line 97, in ui_login
    page.get_by_role("button", name="Log in").click()
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 15749, in click
    self._sync(
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 109, in _sync
    return task.result()
           ^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_locator.py", line 159, in click
    return await self._frame.click(self._selector, strict=True, **params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 483, in click
    await self._channel.send("click", locals_to_params(locals()))
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 490, in wrap_api_call
    return await cb()
           ^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 99, in inner_send
    result = next(iter(done)).result()
             ^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._api_types.TimeoutError: Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
  locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
  waiting for element to be visible, enabled and stable
  element is visible, enabled and stable
  scrolling into view if needed
  done scrolling
  performing click action
  click action done
  waiting for scheduled navigations to finish
============================================================


2024-11-18 15:46:01,765 - 2060128 - browsergym.experiments.loop - WARNING - Exception uncaught by agent or environment in task visualwebarena.102.
TimeoutError:
Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
  locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
  waiting for element to be visible, enabled and stable
  element is visible, enabled and stable
  scrolling into view if needed
...
...truncated middle of the log
...
  File "/home/toolkit/dev/BrowserGym/browsergym/experiments/src/browsergym/experiments/loop.py", line 437, in from_reset
    self.obs, env_info = env.reset(seed=seed)
                         ^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/time_limit.py", line 75, in reset
    return self.env.reset(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset
    return self.env.reset(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/dev/BrowserGym/browsergym/core/src/browsergym/core/env.py", line 303, in reset
    task_goal, task_info = self.task.setup(page=self.page)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/task.py", line 218, in setup
    self.webarena_instance.ui_login(site=site, page=page)
  File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/instance.py", line 97, in ui_login
    page.get_by_role("button", name="Log in").click()
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 15749, in click
    self._sync(
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 109, in _sync
    return task.result()
           ^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_locator.py", line 159, in click
    return await self._frame.click(self._selector, strict=True, **params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 483, in click
    await self._channel.send("click", locals_to_params(locals()))
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 490, in wrap_api_call
    return await cb()
           ^^^^^^^^^^
  File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 99, in inner_send
    result = next(iter(done)).result()
             ^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._api_types.TimeoutError: Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
  locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
  waiting for element to be visible, enabled and stable
  element is visible, enabled and stable
  scrolling into view if needed
  done scrolling
  performing click action
  click action done
  waiting for scheduled navigations to finish
============================================================



4x : Exception uncaught by agent or environment in task <task_name>.
KeyboardInterrupt:
Early termination?

  • visualwebarena.317 seed: 31
  • visualwebarena.317 seed: 31
  • visualwebarena.319 seed: 0
  • visualwebarena.319 seed: 0

Showing Max 3 stack traces:


2024-11-18 15:45:15,048 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:46:41,571 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:48:08,084 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:49:31,546 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:50:56,273 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:52:20,791 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:53:44,993 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:55:08,929 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:56:32,925 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:57:57,838 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:59:22,834 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.


recursix avatar Nov 18 '24 18:11 recursix

The 133 Timeout errors can't be related to the agent, they happen at the login step (during task initialization). The errors seem to stop when going from task 233 to 234, which is also a domain change (classifieds -> reddit). So I suspect something is wrong on the server side with the classifieds website. I'll give it a look. Screenshot 2024-11-19 at 8 44 11 AM

gasse avatar Nov 19 '24 13:11 gasse

In my case it's also hanging, i set up. There's no timeout or error message, but it hangs here:

2024-11-26 13:56:31,657 - 2973944 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-26 13:56:31,714 - 2973944 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-26 13:56:31,751 - 2973944 - root - WARNING - The content of the message has images, which are not displayed in the string representation.

I manually checked the websites but they work. I'm hosting the server on the same place as the run, if that matter.

xhluca avatar Nov 26 '24 23:11 xhluca

haw many are hagning?

recursix avatar Dec 05 '24 12:12 recursix

I had to restart i think 2-3 times, so there's at least that many that were hanging. There's no timeout issue though, stragely

xhluca avatar Dec 05 '24 20:12 xhluca

few changes have been made. It should reduce the hanging. But we'll re-estimate on other experiments

recursix avatar Jul 16 '25 19:07 recursix