self-operating-computer Add a grid of coordinates

Since it likes to misclicks a lot, you could either train a model to do image segmentation or, you can with clever prompt engineering add a barebones grid asking to solve the puzzle, "in which coordinate can the search button be found" this should make it more robust, right?

Dec 01 '23 07:12 Kreijstal

See also:

https://github.com/OthersideAI/self-operating-computer/issues/3

add a barebones grid

I noticed that you currently seem to apply a grid to the images to assist the vision model:

main/operate/main.py#L462-L527

Originally posted by @0xdevalias in https://github.com/OthersideAI/self-operating-computer/issues/3

Dec 01 '23 08:12 0xdevalias

How about applying a dynamic grid approach to enhance click accuracy?

For example, we could adjust the grid density based on the proximity to the cursor. The areas closer to the cursor would have a denser grid, allowing for more accurate click predictions.

Dec 05 '23 01:12 Daisuke134

Set-of-Mark prompting is now available. Swap in your best best.pt from a YOLOv8 model and see how it does.

Jan 07 '24 18:01 joshbickett