[FEATURE] Mouse position calibration
A known issue is that the detected position of the mouse is not accurate. Just as a workaround, could it be calibrated? A screen shoot could be captured, the mouse pointer is then detected, its position is calculated, the mouse is moved to a different position, and the process could be repeated until the position accuracy is enhanced.
@osama-salah have you tried operate -m gpt-4-with-ocr? With the OCR approach the click X & Y are now spot on based on what GPT-4-v decided to click
@joshbickett I use Gemini-pro-vision as I don't have ChatGPT Plus subscription.
@joshbickett I am on windows 10, I am using it with "operate -m gpt-4-with-ocr" and "operate" but in both ways, it couldn't click on exact spot. Is there any specific resolution which I should set my screen size to ?
@osama-salah oh ok. We could add gemini with OCR because OCR uses an open source license that doesn't require a key. If someone could make a PR for that, it'd be great!
@mrkhalil6 if the button or link to click doesn't have text then it will likely fail. Was it "missing" the button or just didn't know what to click?