stagehand icon indicating copy to clipboard operation
stagehand copied to clipboard

Question: use vision AI with element position and click

Open guillegette opened this issue 1 year ago • 1 comments

Hi team,

Sorry for creating an issue about this, please let me know if there is a better place to share thoughts and questions.

I am really enjoying this project but I can already see many instances where the framework is not able to solve the request. Is there a reason why we couldn't use a AI vision model where we give a screenshot of the web page and ask "where is the red button", so we get coordinates back that we can then pass back to the browser and click on it?

Would love to hear your thoughts about this.

guillegette avatar Oct 30 '24 23:10 guillegette

Hey @guillegette! Thanks for sharing. I think this would be a great modifier to the useVision argument (useVisionCoordinates maybe?) We're discussing this in the community Slack here: https://stagehand-dev.slack.com/archives/C07UCP76U8G/p1730392286230649

pkiv avatar Oct 31 '24 16:10 pkiv

Vision has been scraped for now, any updates here?

revmag avatar Jun 27 '25 12:06 revmag

@revmag check out CUA agent support!

miguelg719 avatar Nov 03 '25 03:11 miguelg719