BrowserGym icon indicating copy to clipboard operation
BrowserGym copied to clipboard

Add mouse position into env observation

Open ryanhoangt opened this issue 1 year ago • 4 comments

Hi, thanks for the project! I'm trying to implement and experiment with coordinate-based actions from browsergym and it would be useful if the environment exposes this info via the observation. Not sure what the team thinks about this?

One quirk is seems like there're no direct ways to get the mouse position from Playwright so I use a kinda hacky way to get that info.

ryanhoangt avatar Nov 28 '24 15:11 ryanhoangt

BTW, a cool way to try this feature is to run an openended agent on a whiteboard and ask it to draw simple forms, like we did for the demo video here https://github.com/ServiceNow/BrowserGym/

gasse avatar Dec 03 '24 19:12 gasse

Seems like there is pageX, pageY but also clientX, clientY https://michaelwornow.net/2024/01/02/display-x-y-coords-chrome-debugger

https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/clientX https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/pageX

Only way to know how / which one of these to use is to write some tests :)

gasse avatar Dec 03 '24 20:12 gasse

Seems like there is pageX, pageY but also clientX, clientY https://michaelwornow.net/2024/01/02/display-x-y-coords-chrome-debugger

From the blog seems like clientX/clientY is relative to viewport, and pageX/pageY is relative to the whole webpage. I think clientX/clientY is closer to what we want 🤔

ryanhoangt avatar Dec 06 '24 10:12 ryanhoangt

I would like to move forward with this, but cthe urrent code will not universally work. Could we iterated on this, @ryanhoangt, are you still in terested to work on this. This chat with claude, is inspiring. It seems like we would need to update all action functions in bgym such that it would update a global variable that would contain the appropriate info.

Might not the best solution, but we could itereate on this.

recursix avatar Jul 16 '25 20:07 recursix