OpenAdapt feat: Implement CursorReplayStrategy with Visual Feedback and Self-Correction

feat: Implement CursorReplayStrategy with Visual Feedback and Self-Correction

Open TanCodeX opened this issue 6 months ago • 1 comments

/claim #760

What kind of change does this PR introduce?

Feature: New cursor replay strategy with visual feedback and self-correction

Summary

This PR addresses #760 by introducing a new cursor replay strategy that improves targeting accuracy using visual feedback and AI-powered self-correction.

Key Features:

Red dot visual feedback system for suggested target points
AI-powered accuracy analysis via OpenAI models
Self-correction mechanism based on visual feedback
Grid-based movement with recursive refinement for higher precision
Robust testing framework to measure accuracy, actions, and performance

This strategy sets the groundwork for improving OpenAdapt’s cursor control system in complex screen environments.

Checklist

[x] My code follows OpenAdapt's style guidelines
Follows PEP 8
Uses consistent naming conventions
Maintains existing project structure
[x] Self-reviewed my code
Verified edge cases
Validated parameter types
Checked error handling
[x] Added tests
test_grid.py evaluates grid strategy
Metrics for accuracy, actions, and time
Test cases for various screen regions
[x] Linted code
Used flake8 for Python linting
Fixed all issues
Removed unused imports
[x] Commented the code
Explained AI logic
Documented grid algorithm
Clarified self-correction behavior
[x] Updated documentation
Added docstrings for all methods/classes
Updated requirements.txt
Included usage examples in comments
[x] All new and existing tests pass locally
Visual feedback tests
Grid strategy accuracy checks
OpenAI API integration tests

How can your code be run and tested?

Install dependencies:

pip install -r requirements.txt

Run the grid evaluation:

python -m experiments.cursor.test_grid

Example Output:

Grid Strategy Evaluation Results:
---------------------------------
Total test cases: 45
Average distance error: 5.2 pixels
Average actions per target: 4.3
Average time per target: 0.82 seconds

Results by grid size:
Grid size: 2x2
  Average error: 8.4 pixels
  Average actions: 3.0
  Average time: 0.65 seconds

Grid size: 4x4
  Average error: 4.2 pixels
  Average actions: 4.5
  Average time: 0.85 seconds

Grid size: 8x8
  Average error: 3.1 pixels
  Average actions: 5.5
  Average time: 0.96 seconds

Test specific components:

from openadapt.strategies.cursor import CursorReplayStrategy
from experiments.cursor.grid import GridCursorStrategy

# Visual feedback
strategy = CursorReplayStrategy(recording)
img_with_dot = strategy.paint_dot(screenshot, x=100, y=100)

# Grid approach
grid_strategy = GridCursorStrategy(recording, grid_size=(4, 4))
action = grid_strategy.get_next_action_event(screenshot, window_event)

Dependencies:

opencv-python for visual processing
numpy for grid calculations
openai for visual feedback evaluation

May 29 '25 10:05 TanCodeX

@TanCodeX thank you for your contribution! Can you please show some example output (e.g. video, screenshot, console text)?

Aug 18 '25 21:08 abrichr

OpenAdapt OpenAdapt copied to clipboard

feat: Implement CursorReplayStrategy with Visual Feedback and Self-Correction

OpenAdapt
OpenAdapt copied to clipboard