BrowserGym
BrowserGym copied to clipboard
task hanging in VWA with vision agent
It's weird that the non-vision agent had much less error than the vision based agent.
INFO:root:All jobs are finished. Calling agent_args.close() on all agents...
INFO:root:Experiment finished.
Searching experiments directories.: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1266/1266 [00:00<00:00, 105094.19it/s]
Loading results: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 910/910 [00:00<00:00, 2803.40it/s]
INFO:agentlab.experiments.study:
avg_reward std_err avg_steps n_completed n_err cum_cost
agent.agent_name env.benchmark
GenericAgent-gpt-4o-mini-2024-07-18 visualwebarena 0.146 0.012 10.081 910/910 134 75.9531
INFO:root:Found 134 incomplete experiments in /home/toolkit/agentlab_results/2024-11-18_02-26-48_2-agents-on-2-benchmarks/2024-11-18_09-09-04_genericagent-gpt-4o-mini-2024-07-18-on-visualwebarena.
INFO:root:Make sure the processes that were running are all stopped. Otherwise,
WARNING:agentlab.experiments.study:Study genericagent-gpt-4o-mini-2024-07-18-on-visualwebarena did not finish after 3 trials. There are 134 incomplete experiments.
Searching experiments directories.: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2401/2401 [00:00<00:00, 148941.40it/s]
Loading results: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1820/1820 [00:00<00:00, 44543.97it/s]
INFO:agentlab.experiments.study:
avg_reward std_err avg_steps n_completed n_err cum_cost
agent.agent_name env.benchmark agent.flags.obs.use_screenshot
GenericAgent-gpt-4o-mini-2024-07-18 visualwebarena False 0.141 0.012 12.130 910/910 3 27.5141
GenericAgent-gpt-4o-mini-2024-07-18 visualwebarena True 0.146 0.012 10.081 910/910 134 75.9531
133x : Exception uncaught by agent or environment in task <task_name>.
TimeoutError:
Timeout 10000ms exceeded.
========================
- visualwebarena.100 seed: 25
- visualwebarena.101 seed: 12
- visualwebarena.102 seed: 31
- visualwebarena.103 seed: 31
- visualwebarena.104 seed: 3
- visualwebarena.105 seed: 29
- visualwebarena.106 seed: 22
- visualwebarena.107 seed: 14
- visualwebarena.108 seed: 28
- visualwebarena.109 seed: 12
- visualwebarena.110 seed: 31
- visualwebarena.111 seed: 6
- visualwebarena.112 seed: 21
- visualwebarena.113 seed: 27
- visualwebarena.114 seed: 1
- visualwebarena.115 seed: 5
- visualwebarena.116 seed: 27
- visualwebarena.117 seed: 27
- visualwebarena.119 seed: 29
- visualwebarena.120 seed: 10
- visualwebarena.121 seed: 27
- visualwebarena.122 seed: 24
- visualwebarena.123 seed: 32
- visualwebarena.124 seed: 0
- visualwebarena.125 seed: 26
- visualwebarena.126 seed: 12
- visualwebarena.127 seed: 2
- visualwebarena.128 seed: 5
- visualwebarena.129 seed: 7
- visualwebarena.130 seed: 26
- visualwebarena.131 seed: 8
- visualwebarena.132 seed: 32
- visualwebarena.133 seed: 23
- visualwebarena.134 seed: 14
- visualwebarena.135 seed: 31
- visualwebarena.136 seed: 31
- visualwebarena.137 seed: 23
- visualwebarena.138 seed: 11
- visualwebarena.139 seed: 1
- visualwebarena.140 seed: 2
- visualwebarena.141 seed: 16
- visualwebarena.142 seed: 1
- visualwebarena.146 seed: 31
- visualwebarena.147 seed: 32
- visualwebarena.148 seed: 0
- visualwebarena.149 seed: 18
- visualwebarena.150 seed: 1
- visualwebarena.151 seed: 25
- visualwebarena.152 seed: 31
- visualwebarena.153 seed: 5
- visualwebarena.154 seed: 31
- visualwebarena.156 seed: 10
- visualwebarena.157 seed: 16
- visualwebarena.158 seed: 23
- visualwebarena.161 seed: 5
- visualwebarena.162 seed: 21
- visualwebarena.163 seed: 10
- visualwebarena.164 seed: 15
- visualwebarena.165 seed: 32
- visualwebarena.166 seed: 8
- visualwebarena.167 seed: 5
- visualwebarena.168 seed: 15
- visualwebarena.169 seed: 28
- visualwebarena.170 seed: 2
- visualwebarena.171 seed: 19
- visualwebarena.172 seed: 18
- visualwebarena.173 seed: 25
- visualwebarena.174 seed: 2
- visualwebarena.175 seed: 18
- visualwebarena.176 seed: 19
- visualwebarena.177 seed: 31
- visualwebarena.178 seed: 6
- visualwebarena.179 seed: 32
- visualwebarena.180 seed: 17
- visualwebarena.181 seed: 0
- visualwebarena.182 seed: 10
- visualwebarena.183 seed: 27
- visualwebarena.184 seed: 24
- visualwebarena.185 seed: 22
- visualwebarena.186 seed: 30
- visualwebarena.187 seed: 29
- visualwebarena.188 seed: 6
- visualwebarena.189 seed: 15
- visualwebarena.190 seed: 25
- visualwebarena.191 seed: 1
- visualwebarena.192 seed: 0
- visualwebarena.193 seed: 11
- visualwebarena.194 seed: 4
- visualwebarena.195 seed: 31
- visualwebarena.196 seed: 8
- visualwebarena.197 seed: 18
- visualwebarena.198 seed: 15
- visualwebarena.199 seed: 2
- visualwebarena.200 seed: 19
- visualwebarena.201 seed: 23
- visualwebarena.202 seed: 32
- visualwebarena.204 seed: 10
- visualwebarena.206 seed: 19
- visualwebarena.207 seed: 24
- visualwebarena.209 seed: 28
- visualwebarena.210 seed: 17
- visualwebarena.211 seed: 17
- visualwebarena.212 seed: 1
- visualwebarena.214 seed: 32
- visualwebarena.215 seed: 3
- visualwebarena.216 seed: 32
- visualwebarena.218 seed: 20
- visualwebarena.220 seed: 7
- visualwebarena.221 seed: 6
- visualwebarena.224 seed: 32
- visualwebarena.225 seed: 11
- visualwebarena.226 seed: 21
- visualwebarena.227 seed: 21
- visualwebarena.228 seed: 29
- visualwebarena.229 seed: 7
- visualwebarena.230 seed: 26
- visualwebarena.231 seed: 26
- visualwebarena.232 seed: 33
- visualwebarena.233 seed: 20
- visualwebarena.305 seed: 18
- visualwebarena.305 seed: 18
- visualwebarena.88 seed: 26
- visualwebarena.89 seed: 0
- visualwebarena.90 seed: 13
- visualwebarena.91 seed: 2
- visualwebarena.92 seed: 0
- visualwebarena.93 seed: 4
- visualwebarena.94 seed: 25
- visualwebarena.95 seed: 13
- visualwebarena.96 seed: 26
- visualwebarena.97 seed: 8
- visualwebarena.98 seed: 14
- visualwebarena.99 seed: 14
Showing Max 3 stack traces:
2024-11-18 15:45:43,101 - 2060124 - browsergym.experiments.loop - WARNING - Exception uncaught by agent or environment in task visualwebarena.100.
TimeoutError:
Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
waiting for element to be visible, enabled and stable
element is visible, enabled and stable
scrolling into view if needed
...
...truncated middle of the log
...
File "/home/toolkit/dev/BrowserGym/browsergym/experiments/src/browsergym/experiments/loop.py", line 437, in from_reset
self.obs, env_info = env.reset(seed=seed)
^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/time_limit.py", line 75, in reset
return self.env.reset(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset
return self.env.reset(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/dev/BrowserGym/browsergym/core/src/browsergym/core/env.py", line 303, in reset
task_goal, task_info = self.task.setup(page=self.page)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/task.py", line 218, in setup
self.webarena_instance.ui_login(site=site, page=page)
File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/instance.py", line 97, in ui_login
page.get_by_role("button", name="Log in").click()
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 15749, in click
self._sync(
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 109, in _sync
return task.result()
^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_locator.py", line 159, in click
return await self._frame.click(self._selector, strict=True, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 483, in click
await self._channel.send("click", locals_to_params(locals()))
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
return await self._connection.wrap_api_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 490, in wrap_api_call
return await cb()
^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 99, in inner_send
result = next(iter(done)).result()
^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._api_types.TimeoutError: Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
waiting for element to be visible, enabled and stable
element is visible, enabled and stable
scrolling into view if needed
done scrolling
performing click action
click action done
waiting for scheduled navigations to finish
============================================================
2024-11-18 15:45:44,621 - 2060127 - browsergym.experiments.loop - WARNING - Exception uncaught by agent or environment in task visualwebarena.101.
TimeoutError:
Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
waiting for element to be visible, enabled and stable
element is visible, enabled and stable
scrolling into view if needed
...
...truncated middle of the log
...
File "/home/toolkit/dev/BrowserGym/browsergym/experiments/src/browsergym/experiments/loop.py", line 437, in from_reset
self.obs, env_info = env.reset(seed=seed)
^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/time_limit.py", line 75, in reset
return self.env.reset(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset
return self.env.reset(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/dev/BrowserGym/browsergym/core/src/browsergym/core/env.py", line 303, in reset
task_goal, task_info = self.task.setup(page=self.page)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/task.py", line 218, in setup
self.webarena_instance.ui_login(site=site, page=page)
File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/instance.py", line 97, in ui_login
page.get_by_role("button", name="Log in").click()
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 15749, in click
self._sync(
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 109, in _sync
return task.result()
^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_locator.py", line 159, in click
return await self._frame.click(self._selector, strict=True, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 483, in click
await self._channel.send("click", locals_to_params(locals()))
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
return await self._connection.wrap_api_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 490, in wrap_api_call
return await cb()
^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 99, in inner_send
result = next(iter(done)).result()
^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._api_types.TimeoutError: Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
waiting for element to be visible, enabled and stable
element is visible, enabled and stable
scrolling into view if needed
done scrolling
performing click action
click action done
waiting for scheduled navigations to finish
============================================================
2024-11-18 15:46:01,765 - 2060128 - browsergym.experiments.loop - WARNING - Exception uncaught by agent or environment in task visualwebarena.102.
TimeoutError:
Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
waiting for element to be visible, enabled and stable
element is visible, enabled and stable
scrolling into view if needed
...
...truncated middle of the log
...
File "/home/toolkit/dev/BrowserGym/browsergym/experiments/src/browsergym/experiments/loop.py", line 437, in from_reset
self.obs, env_info = env.reset(seed=seed)
^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/time_limit.py", line 75, in reset
return self.env.reset(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset
return self.env.reset(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/dev/BrowserGym/browsergym/core/src/browsergym/core/env.py", line 303, in reset
task_goal, task_info = self.task.setup(page=self.page)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/task.py", line 218, in setup
self.webarena_instance.ui_login(site=site, page=page)
File "/home/toolkit/dev/BrowserGym/browsergym/visualwebarena/src/browsergym/visualwebarena/instance.py", line 97, in ui_login
page.get_by_role("button", name="Log in").click()
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 15749, in click
self._sync(
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 109, in _sync
return task.result()
^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_locator.py", line 159, in click
return await self._frame.click(self._selector, strict=True, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 483, in click
await self._channel.send("click", locals_to_params(locals()))
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
return await self._connection.wrap_api_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 490, in wrap_api_call
return await cb()
^^^^^^^^^^
File "/home/toolkit/micromamba/envs/ui-assist/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 99, in inner_send
result = next(iter(done)).result()
^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._api_types.TimeoutError: Timeout 10000ms exceeded.
=========================== logs ===========================
waiting for get_by_role("button", name="Log in")
locator resolved to <button type="submit" class="btn btn-primary">Log in</button>
attempting click action
waiting for element to be visible, enabled and stable
element is visible, enabled and stable
scrolling into view if needed
done scrolling
performing click action
click action done
waiting for scheduled navigations to finish
============================================================
4x : Exception uncaught by agent or environment in task <task_name>.
KeyboardInterrupt:
Early termination?
- visualwebarena.317 seed: 31
- visualwebarena.317 seed: 31
- visualwebarena.319 seed: 0
- visualwebarena.319 seed: 0
Showing Max 3 stack traces:
2024-11-18 15:45:15,048 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:46:41,571 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:48:08,084 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:49:31,546 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:50:56,273 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:52:20,791 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:53:44,993 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:55:08,929 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:56:32,925 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:57:57,838 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-18 15:59:22,834 - 2060122 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
The 133 Timeout errors can't be related to the agent, they happen at the login step (during task initialization). The errors seem to stop when going from task 233 to 234, which is also a domain change (classifieds -> reddit). So I suspect something is wrong on the server side with the classifieds website. I'll give it a look.
In my case it's also hanging, i set up. There's no timeout or error message, but it hangs here:
2024-11-26 13:56:31,657 - 2973944 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-26 13:56:31,714 - 2973944 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
2024-11-26 13:56:31,751 - 2973944 - root - WARNING - The content of the message has images, which are not displayed in the string representation.
I manually checked the websites but they work. I'm hosting the server on the same place as the run, if that matter.
haw many are hagning?
I had to restart i think 2-3 times, so there's at least that many that were hanging. There's no timeout issue though, stragely
few changes have been made. It should reduce the hanging. But we'll re-estimate on other experiments