Implement model cursor for visual feedback
Feature request
Update: see https://github.com/OpenAdaptAI/OpenAdapt/issues/760#issuecomment-2347337901 for the latest requirements.
We want to be able to give the model the ability to:
- paint a red dot on its suggested target location
- look at the screenshot with the dot on it,
- optionally self correct.
Thank you @LunjunZhang for the suggestion đ
This involves creating a CursorReplayStrategy (based on the VanillaReplayStrategy) that implements the required prompting.
Motivation
Correct errors, e.g. missed segmentations.
Possibly related: https://arxiv.org/abs/2406.09403:
Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. The LM conducts planning and reasoning according to the visual artifacts it has drawn. ... Sketchpad substantially improves performance on all tasks over strong base models with no sketching, yielding an average gain of 12.7% on math tasks, and 8.6% on vision tasks. GPT-4o with Sketchpad sets a new state of the art on all tasks, including V*Bench (80.3%), BLINK spatial reasoning (83.9%), and visual correspondence (80.8%). All codes and data are in this https URL.
/bounty $1000
đ $1,000 bounty âĸ OpenAdaptAI
Steps to solve:
- Start working: Comment
/attempt #760with your implementation plan - Submit work: Create a pull request including
/claim #760in the PR body to claim the bounty - Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts
â Important guidelines:
- To claim a bounty, you need to provide a short demo video of your changes in your pull request
- If anything is unclear, ask for clarification before starting as this will help avoid potential rework
- Low quality AI PRs will not receive review and will be closed
- Do not ask to be assigned unless you've contributed before
Thank you for contributing to OpenAdaptAI/OpenAdapt!
| Attempt | Started (UTC) | Solution | Actions |
|---|---|---|---|
| đĸ @blocator23 | Aug 01, 2025, 08:21:18 AM | #956 | Reward |
| đ´ @Ahmadkhan02 | Jul 02, 2024, 08:09:09 PM | WIP | |
| đĸ @onyedikachi-david | Jul 04, 2024, 10:10:18 AM | #823 | Reward |
| đĸ @varshith257 | Jul 04, 2024, 08:27:40 PM | WIP | |
| đĸ @stdthoth | Sep 12, 2024, 08:37:31 PM | WIP | |
| đĸ @Amanullah1002 | Jun 17, 2024, 03:18:43 AM | WIP | |
| đ´ @Subh231004 | Jun 17, 2024, 06:29:42 AM | WIP | |
| đ´ @ | Jun 17, 2024, 06:31:46 AM | WIP | |
| đĸ @hoklims | Nov 19, 2024, 04:01:52 PM | #923 | Reward |
| đĸ @MAVRICK-1 | Aug 19, 2025, 06:34:52 PM | WIP | |
| đĸ @TanCodeX | May 29, 2025, 09:25:19 AM | #952 | Reward |
/attempt #760
Implementation Plan for Model Cursor Feedback (Issue #760) Create CursorReplayStrategy: I'll develop a new CursorReplayStrategy class extending VanillaReplayStrategy. Paint Red Dot: I'll implement a method to paint a red dot on the target location within a given image. Screenshot Capture: I'll implement a method to capture a screenshot and overlay the red dot on it. Self-Correction: I'll add an optional self-correction mechanism based on the screenshot with the dot. Testing: I'll write and execute unit tests to ensure the functionality works as intended. Documentation: I'll update the project documentation to include usage instructions for the new strategy. Pull Request: I'll submit a PR for review, incorporating any feedback provided. This plan will systematically address the issue by creating a targeted strategy, ensuring it functions correctly, and updating the documentation for users.
Options
@Subh231004 please keep the discussion related to your pull request on your pull request and not here. I have replied to your comment there.
/attempt #760
| Algora profile | Completed bounties | Tech | Active attempts | Options |
|---|---|---|---|---|
| @onyedikachi-david | 2 bounties from 1 project | JavaScript, Shell |
īš764 |
Cancel attempt |
/attempt #760
| Algora profile | Completed bounties | Tech | Active attempts | Options |
|---|---|---|---|---|
| @Ahmadkhan02 | 1 bounty from 1 project | TypeScript, Jupyter Notebook |
Cancel attempt |
đĄ @onyedikachi-david submitted a pull request that claims the bounty. You can visit your bounty board to reward.
/attempt #760
| Algora profile | Completed bounties | Tech | Active attempts | Options |
|---|---|---|---|---|
| @varshith257 | 4 bounties from 2 projects | Python, Rust, TypeScript, Go |
Cancel attempt |
Hi @abrichr is this still available ?
Hi @stdthoth , thanks for your interest.
We attempted a few different approaches at https://github.com/OpenAdaptAI/OpenAdapt/pull/867. It is available if you can implement a different approach that improves on the performance of any of these!
@abrichr i am working on it now... could you possibly assign this to me for a week ?
Hi @stdthoth , thank you! Can you please clarify your request?
I just updated the description to include more details about the current approaches, recreated here:
- experiments/cursor/coords.py: Uses AI prompts to iteratively locate a target in an image by drawing concentric circles.
- experiments/cursor/direction.py: Moves a cursor towards a target using AI-driven direction and magnitude adjustments.
- experiments/cursor/grid.py: Identifies target cells in an image by overlaying a grid and using AI feedback.
- experiments/cursor/joystick.py: Adjusts a cursor's position toward a target with joystick-like AI-guided movements.
- experiments/cursor/joystick_history.py: Similar to joystick.py but tracks a longer history of movements.
- experiments/cursor/quadrant.py: Locates a target by iteratively narrowing down search areas in image quadrants.
- experiments/cursor/sample.py: Uses AI voting to find the closest cursor to a target in an image.
- experiments/cursor/search.py: Refines cursor coordinates toward a target using binary search-like AI feedback.
I believe the next step here is to systematically evaluate the performance of these in a repeatable way (e.g. programmatically). Only then will we be able to implement the requirement to:
implement a different approach that improves on the performance of any of these
Please let me know if you have any questions!
Edit: if you prefer, you can also implement a novel approach, without evaluating these ones. But we will be unable to award the bounty until we can confirm that your approach outperforms all of these.
Edit: https://visualsketchpad.github.io/ may perform very well.
/attempt https://github.com/OpenAdaptAI/OpenAdapt/issues/760
đĄ @hoklims submitted a pull request that claims the bounty. You can visit your bounty board to reward.
Hi , is this issues still open ?
Hey @abrichr is this issue still available? If it is then would like to give it a try according to updated requirements in https://github.com/OpenAdaptAI/OpenAdapt/issues/760#issuecomment-2347337901. Should i directly go with a PR with my approach for this or first should discuss the approach?
@neoandmatrix thank you for your interest!
Please propose an approach first. Note that it should be distinct from those already implemented.
/attempt #760
/attempt #760
I'll develop a source code based on a classification model and Python library in order to:
- Classify the type of images files and visual expression.
- Measure color, sizes, brights and edits.
- Use of machine learning to support analysis and predictions.
/attempt #760