[Feature]: Introduce AI locators and actions
🚀 Feature Request
There's an interesting library: that could be an inspiration how to add either AI locators feature or actions based on prompts.
You may find the Stagehand project to be a valuable resource. It offers insightful examples that could inspire the integration of AI locator features or prompt-based actions.
Example
https://docs.stagehand.dev/get_started/introduction
Motivation
Users could expedite test creation, with basic operations such as clicking buttons, activating toggles, and typing becoming less dependent on static locators. Additionally, actions like data extraction from a page could be more comprehensible to individuals without a programming background, as they are articulated in plain language.
What if the AI chooses the wrong locator?...Are you aware that Quality Assurance or Control is about being deterministic on what is expected and not expected by the Humans who use it?
What if the AI chooses the wrong locator?
If a company is prepared for the risks associated with using AI, then there should be no issues. It is ultimately the decision of the users who utilize the tool. In some projects, speed is more important than the X% risk that AI might click the wrong control.
I tried the mentioned tool and did not encounter any problems with simple actions (clicking a button, typing, or switching locators). However, there were definite issues when it came to data extraction.
Currently, it is not a requirement due to the limitations of AI and possibly the cost. However, with the rapid advancements in AI in recent years, it might evolve faster than predicted.
From my perspective, AI will be used in test automation; the question is which framework will manage to make it work quite stably.
This ticket is intended to encourage the team to consider this possibility and explore what is already available on the market to remain the leading test automation framework.
Can't we also use enableCaching in StageHand to have some determinism? .
I could also see that the framework could output a test file.
I just released: https://github.com/lila-team/ai-locators
AI Locator for Playwright. This library is available for Python and Node. It adds a custom selector engine making it possible to:
await page.locator("ai=the login button").click()
Really interesting motivation — I’ve been exploring a similar idea focused on turning human-driven browser workflows into resilient, reusable APIs, especially for multi-step UI flows (e.g. portals with no APIs, legacy dashboards, etc.).
Stagehand’s AI locator and prompt-based action model is a solid reference — though in practice I’ve noticed fragility even with AI locators when DOM structure is highly dynamic or semantically poor. Have you had success for the test creation use case? Thanks!