self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

[FEATURE] Mouse position calibration

Open osama-salah opened this issue 1 year ago • 5 comments

A known issue is that the detected position of the mouse is not accurate. Just as a workaround, could it be calibrated? A screen shoot could be captured, the mouse pointer is then detected, its position is calculated, the mouse is moved to a different position, and the process could be repeated until the position accuracy is enhanced.

osama-salah avatar Jan 25 '24 22:01 osama-salah

@osama-salah have you tried operate -m gpt-4-with-ocr? With the OCR approach the click X & Y are now spot on based on what GPT-4-v decided to click

joshbickett avatar Jan 26 '24 16:01 joshbickett

@joshbickett I use Gemini-pro-vision as I don't have ChatGPT Plus subscription.

osama-salah avatar Jan 26 '24 22:01 osama-salah

@joshbickett I am on windows 10, I am using it with "operate -m gpt-4-with-ocr" and "operate" but in both ways, it couldn't click on exact spot. Is there any specific resolution which I should set my screen size to ?

mrkhalil6 avatar Feb 03 '24 16:02 mrkhalil6

@osama-salah oh ok. We could add gemini with OCR because OCR uses an open source license that doesn't require a key. If someone could make a PR for that, it'd be great!

joshbickett avatar Feb 09 '24 04:02 joshbickett

@mrkhalil6 if the button or link to click doesn't have text then it will likely fail. Was it "missing" the button or just didn't know what to click?

joshbickett avatar Feb 09 '24 04:02 joshbickett