adding README comment on -accuracy and beginning of the -accuracy grid rewrite, and delete Poetry artifacts from README
Closes #77
some premature work on having the model pick which grid to choose on for revamped -accuracy mode. properly display the grid coordinates now.
I plan on modifying the idea, I first cut out a 400px x 400px area around the originally guessed location, and then have the model continually pick which grid option/quadrant to click on from there, cropping out the selected grid in the process and x2 upsampling the image after each crop every time before passing it once again to GPT.
Adding an implementation note for my future self:
A clean way to implement the picking which grid to zoom in on when deciding which pixel to click can be cleanly implemented by the loop constantly storing the top left percentages and the bottom right percentages at each iteration of the loop. That way, at the end, you can just average the two percentages and return that as the pixel clicked.
Additionally, maybe at first do 4 grid lines (dividing the area into 16 grids). but later, when more narrowed down, only do 2 grid lines (so dividing the area into fourths). Something like two 4 grid lines, and two 2 grid lines will yield a final pixel area of 400/(4^2 * 2^2) = 6.25, or a pixel mistake up to 3 pixels in any dimension. That is pretty darn accurate assuming the model correctly picks the correct grid every time.
Additionally, look into polling the model. So ask the model to generate 9 responses, and then choose the most popular grid selection. Fail-safing the chance that a wrong grid choice was picked.
Hmm, I'm curious for a bit more context for some on this commit. Hoping to keep most the none -accurate code the same when making -accurate improvements
@joshbickett hey sorry just seeing this now, what do you mean by most of the none code the same? I'm planning on basically having two different draw_labels. for normal mouse clicking it shows the percentages in black with a white background. But when choosing grid, I forego the white rectangle and just display the text in a green color (this is because at some point, it gets zoomed in a lot to like a 6px x 6px range so having a white rectangle taking up pixels doesn't seem like the best idea). Esp when the model should know the top left corner is grid 0, then goes down then right (column major order)
@klxu03 you can ignore my last comment. I thought draw_label_with_background changed significantly but now I just see you added a condition for your -accurate method. All good, no concerns.
I am taking a closer look now. Got an error I haven't seen running normal operate without -accurate. Maybe a fluke, I'll look closer
@klxu03 Tried -accurate mode on a task got this error. I'm very interested to see where this PR goes. Let me know when you think it is ready for more testing!
+1 for accuracy mode
I am taking a closer look now. Got an error I haven't seen running normal
operatewithout-accurate. Maybe a fluke, I'll look closer
@klxu03 let me know if you have any updates or thoughts on this. Thanks
@klxu03 curious if you have any updates. Looks like -accurate may still have issues. May make sense to remove for now until there are updates
For sure remove, it's likely outdated. My bad I've been offline for a while on vacation. Returning later
@klxu03 did a rewrite of the project without accuracy mode. I think that multimodal are going to solve this mouse click problem pretty soon. See CogAgent: https://arxiv.org/abs/2312.08914
I'll close this for now. If you have additional updates, let me know.