self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

Scrolling up and down not added

Open klxu03 opened this issue 1 year ago • 5 comments

I just noticed that the model doesn't have access to scrolling up and down. Is this difficult to implement generally (asking mostly for Linux, but of course interested in Mac, and Windows)?

If so, I may try adding in a web mode and leverage Selenium to scroll.

klxu03 avatar Dec 04 '23 06:12 klxu03

How about using pyautogui for scrolling pressing the arrow down key? The issue is can the model ( GPT 4) identify if it has to scroll?

shubhexists avatar Dec 04 '23 06:12 shubhexists

@klxu03 @shubhexists Here's an example of how you could scroll using PyAutoGUI from https://pyautogui.readthedocs.io/en/latest/mouse.html. This would probably be preferred over simulating any scrolling as the .scroll(...) function scrolls as a human would with a mouse wheel. This should also be multi-platform.

>>> pyautogui.scroll(10)   # scroll up 10 "clicks"
>>> pyautogui.scroll(-10)  # scroll down 10 "clicks"
>>> pyautogui.scroll(10, x=100, y=100)  # move mouse cursor to 100, 200, then scroll up 10 "clicks"

The CLICK action & prompt could be modified to support scroll amount as a response or something like that.

I might open a PR for this if you're not working on it @klxu03.

michaelhhogue avatar Dec 04 '23 12:12 michaelhhogue

Interested to see how the scroll performs. I'll take a look this week

joshbickett avatar Dec 06 '23 16:12 joshbickett

@michaelhhogue thanks for sending! yeah maybe can you open up a PR with pyautogui to see performance. I'm not sure how it would be easy to bake this into the prompt and for GPT to figure out it needs to scroll.

When I talked about baking it into Selenium, it'd be with additional functionality like Selenium taking a picture of the entire website (including the parts of the site below the current viewer) so the model knows what is down there

klxu03 avatar Dec 07 '23 02:12 klxu03

@klxu03 I have #76 opened! Feel free to clone the PR and try it out. Let me know if you have any feedback or suggestions.

I have it taking more of an "exploration" approach rather than knowing ahead of time what will be shown after scrolling (as a human would). When the model scrolls, it can choose to not do a left click so as to not accidentally click on something after doing the scroll.

michaelhhogue avatar Dec 07 '23 02:12 michaelhhogue