webarena icon indicating copy to clipboard operation
webarena copied to clipboard

Long cost time when getting observation during env.step/reset

Open zijianma17 opened this issue 11 months ago • 5 comments

Hi, I have been testing on your wonderful datasets for months. Recently I plan to do more experiments on it. I do found that the _get_obs() takes a lot of time during the setting, whether in "reset" function or "step" function. Would you please give me some advices on how to accelerate the process? Maybe improve the computer hardware? Since it was acceptable on some small tests before, but currentlty I hope it could run faster. Thanks for your reply!

zijianma17 avatar Mar 11 '24 17:03 zijianma17

One thing is to disable headless browser mode and see what the browser is actually performing at the point. Also I think hardware is quite important especially if you are running on servers. Servers without any GPUs will prevent Chromium rendering engine from using hardware acceleration. You best bet is still to visually check the browser (headless=false).

frankxu2004 avatar Mar 12 '24 23:03 frankxu2004

Probably dupe of #66

wookayin avatar Mar 14 '24 04:03 wookayin

One thing is to disable headless browser mode and see what the browser is actually performing at the point. Also I think hardware is quite important especially if you are running on servers. Servers without any GPUs will prevent Chromium rendering engine from using hardware acceleration. You best bet is still to visually check the browser (headless=false).

Thanks. But I do use headless=false to have a look at the browser. The issue is the same as @wookayin mentioned. It seems that there is still not a good way to solve it (e.g. using asyn instead, but need a lot of work).

zijianma17 avatar Mar 14 '24 12:03 zijianma17

  1. The env.step needs more than 10s almost all the time in my computer. Any ideas about why you need about 2~3 seconds? maybe due to the hardware?
  2. The most time-consuming function is get_bounding_client_rect() just as the problem in #66 . (which is unacceptable for large amount of experiments). But I would like to try using "current_viewport_only=False", thus bounding of each node is in my opinon no longer needed. Am I right? Is this the only use of the boundings?
  3. However after I assign node_bound with [0 0 10 10 ] here and skip the get_bounding_client_rect(), the env runs indeed very fast but the element_id doesn't works anymore -> relevant dict may be constructed incorrectly. Could you give me some advice on that? Thanks and have a nice week!

zijianma17 avatar Mar 17 '24 21:03 zijianma17

The env.step needs more than 10s

Are you experimenting with our hosted websites? To test if it is an issue of your own device, you can modify the code here to random websites and test complicated ones (like Amazon) and simple ones (like Google search home page) and time it.

s = f"""page.goto("<change to different URLs>")
page.scroll(down)"""
action_seq = s.split("\n")

2 and 3

Due to the implementation, when current_viewport_only=False, the actions will not work. Similar to #102. I have started working on it

shuyanzhou avatar Mar 23 '24 20:03 shuyanzhou

https://github.com/web-arena-x/webarena/issues/66#issuecomment-2145514211

shuyanzhou avatar Jun 03 '24 15:06 shuyanzhou