BrowserGym
BrowserGym copied to clipboard
Add mouse position into env observation
Hi, thanks for the project! I'm trying to implement and experiment with coordinate-based actions from browsergym and it would be useful if the environment exposes this info via the observation. Not sure what the team thinks about this?
One quirk is seems like there're no direct ways to get the mouse position from Playwright so I use a kinda hacky way to get that info.
BTW, a cool way to try this feature is to run an openended agent on a whiteboard and ask it to draw simple forms, like we did for the demo video here https://github.com/ServiceNow/BrowserGym/
Seems like there is pageX, pageY but also clientX, clientY https://michaelwornow.net/2024/01/02/display-x-y-coords-chrome-debugger
https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/clientX https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/pageX
Only way to know how / which one of these to use is to write some tests :)
Seems like there is pageX, pageY but also clientX, clientY https://michaelwornow.net/2024/01/02/display-x-y-coords-chrome-debugger
From the blog seems like clientX/clientY is relative to viewport, and pageX/pageY is relative to the whole webpage. I think clientX/clientY is closer to what we want 🤔
I would like to move forward with this, but cthe urrent code will not universally work. Could we iterated on this, @ryanhoangt, are you still in terested to work on this. This chat with claude, is inspiring. It seems like we would need to update all action functions in bgym such that it would update a global variable that would contain the appropriate info.
Might not the best solution, but we could itereate on this.