Josh Bickett

Results 99 comments of Josh Bickett

Ok sounds good. I wonder if it makes sense to create a `linux-readme.md` or something like that which we can link to from the `readme.md` for a more in-depth install?...

> @ahsin-s this is a known limitation of GPT-4V at the moment. From [OpenAI Documentation,](https://platform.openai.com/docs/guides/vision/limitations) "**Limitations:** [...] Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as...

> Project maintainers don't seem to point out that gpt4v is NOT leveraging the grid overlay to estimate coordinates and is relying on heuristics in its training data instead. GPT-4-V...

@khalidovicGPT yes, I saw that. It is certainly an exciting update. Would you be interested in attempting a PR of Ferret into the Self-Operating Computer?

@khalidovicGPT sounds good. I briefly tried to run Ferret to try it out but realized it required more time that I had yesterday morning. From briefly reading the paper it...

I wanted to mention the [OCR approach](https://x.com/josh_bickett/status/1750569783568011364?s=20) for those who have not seen it, it goes part way to solve this issue.

> I'll keep you updated if I manage to make more progress on my end. Good luck, Josh : @khalidovicGPT curious if you were able to make any more progress?

I'll close this ticket for now since we have the OCR approach now!

@michaelhhogue thanks for this PR. Still need to review it. I'll let you know if I have any questions!

@michaelhhogue had a chance to take a closer look. I think this approach is very interesting. It makes sense to expand this action from `click` to `mouse` and to closer...