self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

Feat: Navigating to Search Bar using `Cmd`/`Ctrl` + `L`

Open shubhexists opened this issue 1 year ago • 4 comments

As mentioned in the Readme, probably cmd + L would be a better thing to navigate yo the search bar

Even I faced the issue of navigating to the search bar corrrectly as different browsers have different location for their seach bar (maybe) https://github.com/OthersideAI/self-operating-computer/issues/39#issuecomment-1836316268

If everyone approves, I would go ahead and implement this?

shubhexists avatar Dec 02 '23 02:12 shubhexists

As far as I can think, there might be 2 implementations for this -

  1. Change the promt asking it to use cmd + L for navigating to the search bar directly...
  2. Change the promt to detect if it is a browser, and if it is a browser.. use pyautogui to press cmd + L

Whichever would be more accurate, idk..

shubhexists avatar Dec 02 '23 02:12 shubhexists

@shubhexists Based on the wording of this section in the README:

We recognize that some operating system functions may be more efficiently executed with hotkeys such as entering the Browser Address bar using command + L rather than by simulating a mouse click at the correct XY location. We plan to make these improvements over time. However, it's important to note that many actions require the accurate selection of visual elements on the screen, necessitating precise XY mouse click locations. A primary focus of this project is to refine the accuracy of determining these click locations. We believe this is essential for achieving a fully self-operating computer in the current technological landscape.

It sounds like the primary vision of the project at the moment is to improve click accuracy. Something that the cursor will likely be doing a lot in this program is moving to the navigation bar in the browser. That is likely why cmd+L / ctrl+L hasn't yet been implemented.

michaelhhogue avatar Dec 02 '23 14:12 michaelhhogue

Fine, Makes sense :/ We can not run away from the fact that accuracy is more important. These features can be implemented later...

shubhexists avatar Dec 02 '23 14:12 shubhexists

I think that #8 essentially handles this by create a "command" key system for the prompt. I think this makes sense long term. The goal of this project is to allow multi-modal models to most exactly emulate the humans interaction with the computer. I still need to review #8

joshbickett avatar Dec 07 '23 19:12 joshbickett

@shubhexists I'll close this now that this is implemented as a standard part of the project

joshbickett avatar Feb 09 '24 05:02 joshbickett