OpenAdapt Implement model cursor for visual feedback

Feature request

Update: see https://github.com/OpenAdaptAI/OpenAdapt/issues/760#issuecomment-2347337901 for the latest requirements.

We want to be able to give the model the ability to:

paint a red dot on its suggested target location
look at the screenshot with the dot on it,
optionally self correct.

Thank you @LunjunZhang for the suggestion 🙏

This involves creating a CursorReplayStrategy (based on the VanillaReplayStrategy) that implements the required prompting.

Motivation

Correct errors, e.g. missed segmentations.

Possibly related: https://arxiv.org/abs/2406.09403:

Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. The LM conducts planning and reasoning according to the visual artifacts it has drawn. ... Sketchpad substantially improves performance on all tasks over strong base models with no sketching, yielding an average gain of 12.7% on math tasks, and 8.6% on vision tasks. GPT-4o with Sketchpad sets a new state of the art on all tasks, including V*Bench (80.3%), BLINK spatial reasoning (83.9%), and visual correspondence (80.8%). All codes and data are in this https URL.

Jun 16 '24 12:06 abrichr

/bounty $1000

Jun 17 '24 00:06 abrichr

💎 $1,000 bounty • OpenAdaptAI

Steps to solve:

Start working: Comment /attempt #760 with your implementation plan
Submit work: Create a pull request including /claim #760 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

❗ Important guidelines:

To claim a bounty, you need to provide a short demo video of your changes in your pull request
If anything is unclear, ask for clarification before starting as this will help avoid potential rework
Low quality AI PRs will not receive review and will be closed
Do not ask to be assigned unless you've contributed before

Thank you for contributing to OpenAdaptAI/OpenAdapt!

Attempt	Started (UTC)	Solution	Actions
🟢 @blocator23	Aug 01, 2025, 08:21:18 AM	#956	Reward
🔴 @Ahmadkhan02	Jul 02, 2024, 08:09:09 PM	WIP
🟢 @onyedikachi-david	Jul 04, 2024, 10:10:18 AM	#823	Reward
🟢 @varshith257	Jul 04, 2024, 08:27:40 PM	WIP
🟢 @stdthoth	Sep 12, 2024, 08:37:31 PM	WIP
🟢 @Amanullah1002	Jun 17, 2024, 03:18:43 AM	WIP
🔴 @Subh231004	Jun 17, 2024, 06:29:42 AM	WIP
🔴 @	Jun 17, 2024, 06:31:46 AM	WIP
🟢 @hoklims	Nov 19, 2024, 04:01:52 PM	#923	Reward
🟢 @MAVRICK-1	Aug 19, 2025, 06:34:52 PM	WIP
🟢 @TanCodeX	May 29, 2025, 09:25:19 AM	#952	Reward

Jun 17 '24 00:06 algora-pbc[bot]

/attempt #760

Options

Cancel my attempt

Jun 17 '24 06:06 Subh231004

/attempt #760

Implementation Plan for Model Cursor Feedback (Issue #760) Create CursorReplayStrategy: I'll develop a new CursorReplayStrategy class extending VanillaReplayStrategy. Paint Red Dot: I'll implement a method to paint a red dot on the target location within a given image. Screenshot Capture: I'll implement a method to capture a screenshot and overlay the red dot on it. Self-Correction: I'll add an optional self-correction mechanism based on the screenshot with the dot. Testing: I'll write and execute unit tests to ensure the functionality works as intended. Documentation: I'll update the project documentation to include usage instructions for the new strategy. Pull Request: I'll submit a PR for review, incorporating any feedback provided. This plan will systematically address the issue by creating a targeted strategy, ensuring it functions correctly, and updating the documentation for users.

Options

Cancel my attempt

Jun 17 '24 06:06 Anshgrover23

@Subh231004 please keep the discussion related to your pull request on your pull request and not here. I have replied to your comment there.

Jun 20 '24 13:06 abrichr

/attempt #760

Algora profile	Completed bounties	Tech	Active attempts	Options
@onyedikachi-david	2 bounties from 1 project	JavaScript, Shell	﹟764	Cancel attempt

Jun 25 '24 15:06 onyedikachi-david

/attempt #760

Algora profile	Completed bounties	Tech	Active attempts	Options
@Ahmadkhan02	1 bounty from 1 project	TypeScript, Jupyter Notebook		Cancel attempt

Jul 02 '24 20:07 Ahmadkhan02

💡 @onyedikachi-david submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Jul 04 '24 10:07 algora-pbc[bot]

/attempt #760

Algora profile	Completed bounties	Tech	Active attempts	Options
@varshith257	4 bounties from 2 projects	Python, Rust, TypeScript, Go		Cancel attempt

Jul 04 '24 20:07 varshith257

Hi @abrichr is this still available ?

Sep 12 '24 18:09 stdthoth

Hi @stdthoth , thanks for your interest.

We attempted a few different approaches at https://github.com/OpenAdaptAI/OpenAdapt/pull/867. It is available if you can implement a different approach that improves on the performance of any of these!

Sep 12 '24 19:09 abrichr

/attempt #760

Options

Cancel my attempt

Sep 12 '24 20:09 stdthoth

@abrichr i am working on it now... could you possibly assign this to me for a week ?

Sep 12 '24 20:09 stdthoth

Hi @stdthoth , thank you! Can you please clarify your request?

I just updated the description to include more details about the current approaches, recreated here:

experiments/cursor/coords.py: Uses AI prompts to iteratively locate a target in an image by drawing concentric circles.
experiments/cursor/direction.py: Moves a cursor towards a target using AI-driven direction and magnitude adjustments.
experiments/cursor/grid.py: Identifies target cells in an image by overlaying a grid and using AI feedback.
experiments/cursor/joystick.py: Adjusts a cursor's position toward a target with joystick-like AI-guided movements.
experiments/cursor/joystick_history.py: Similar to joystick.py but tracks a longer history of movements.
experiments/cursor/quadrant.py: Locates a target by iteratively narrowing down search areas in image quadrants.
experiments/cursor/sample.py: Uses AI voting to find the closest cursor to a target in an image.
experiments/cursor/search.py: Refines cursor coordinates toward a target using binary search-like AI feedback.

I believe the next step here is to systematically evaluate the performance of these in a repeatable way (e.g. programmatically). Only then will we be able to implement the requirement to:

implement a different approach that improves on the performance of any of these

Please let me know if you have any questions!

Edit: if you prefer, you can also implement a novel approach, without evaluating these ones. But we will be unable to award the bounty until we can confirm that your approach outperforms all of these.

Edit: https://visualsketchpad.github.io/ may perform very well.

Sep 12 '24 22:09 abrichr

/attempt https://github.com/OpenAdaptAI/OpenAdapt/issues/760

Nov 19 '24 14:11 hoklims

💡 @hoklims submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Nov 19 '24 16:11 algora-pbc[bot]

Hi , is this issues still open ?

Jan 13 '25 11:01 Girma35

Hey @abrichr is this issue still available? If it is then would like to give it a try according to updated requirements in https://github.com/OpenAdaptAI/OpenAdapt/issues/760#issuecomment-2347337901. Should i directly go with a PR with my approach for this or first should discuss the approach?

May 17 '25 05:05 neoandmatrix

@neoandmatrix thank you for your interest!

Please propose an approach first. Note that it should be distinct from those already implemented.

May 23 '25 20:05 abrichr

/attempt #760

May 29 '25 09:05 TanCodeX

/attempt #760

Aug 01 '25 08:08 blocator23

I'll develop a source code based on a classification model and Python library in order to:

Classify the type of images files and visual expression.
Measure color, sizes, brights and edits.
Use of machine learning to support analysis and predictions.

Aug 03 '25 10:08 blocator23

/attempt #760

Aug 19 '25 18:08 MAVRICK-1