self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

Add a grid of coordinates

Open Kreijstal opened this issue 2 years ago • 2 comments

Since it likes to misclicks a lot, you could either train a model to do image segmentation or, you can with clever prompt engineering add a barebones grid asking to solve the puzzle, "in which coordinate can the search button be found" this should make it more robust, right?

Kreijstal avatar Dec 01 '23 07:12 Kreijstal

See also:

  • https://github.com/OthersideAI/self-operating-computer/issues/3

add a barebones grid

I noticed that you currently seem to apply a grid to the images to assist the vision model:

Originally posted by @0xdevalias in https://github.com/OthersideAI/self-operating-computer/issues/3

0xdevalias avatar Dec 01 '23 08:12 0xdevalias

How about applying a dynamic grid approach to enhance click accuracy?

For example, we could adjust the grid density based on the proximity to the cursor. The areas closer to the cursor would have a denser grid, allowing for more accurate click predictions.

Daisuke134 avatar Dec 05 '23 01:12 Daisuke134

Set-of-Mark prompting is now available. Swap in your best best.pt from a YOLOv8 model and see how it does.

joshbickett avatar Jan 07 '24 18:01 joshbickett