playwright-bdd
playwright-bdd copied to clipboard
BDD + MCP: Your thoughts?
Hey community!
I've been exploring how we might integrate the growing MCP technology with BDD testing.
The Playwright MCP server offers a set of tools that are actually low-level BDD steps, like navigate, click, etc. I feel there's real potential here to build something where feature files could be executed without manually defining each step.
The exact approach is still fuzzy, but I'd love to hear your thoughts. How do you think MCP could best fit into a BDD workflow?
This is something that my company is actively looking at and interested in. We have a spike project around this.
This is something my colleagues and company are interested in however the licensing and privacy would need to be looked into, their is some obvious apprehension about any information being passed on through thr prompts. I think if this feature becomes optional it alleviates a lot of that concern as we could always switch it off if deemed a bad fit for us.
I'm also looking into this. I found this framework which I find has a nice link between writing code for things you are sure of and using natural language for the more unknown/new things: https://docs.stagehand.dev/get_started/introduction They also have an MCP server: https://docs.stagehand.dev/integrations/mcp-server
Then my idea would be: people (or even an agentic ai) write scenarios based on three amigos discussion -> scenarii are passed to stagehand mcp -> stagehand understands based on context and executes the test.
I will need to do more investigation to see if this is correct and useful.
In my experience with the Playwright-MCP server in combination with VS Code Copilot (Claude Sonnet 4) I saw that it could execute the same testcases that I wrote down as a feature perfectly. The gherkin I employed was imperative (and very strict) and it seems to provide enough limitations for the testcase to run as expected. The next step for me is to see if a more declarative statement would be sufficient, e.g. can a user story without any gherkin be enough to get to a test case.
I do think that running regression testing via an AI agent is not desired, however an initial AI pipeline or just a local process could be used to run and generate an testcase which can subsequently be added to an regression set.
So considering that, what I noticed is that the Copilot agent seems to prefer using Powershell and messes up running certain script (even if they are in the package.json). Perhaps an playwright-bdd-mcp can help with this. Inserting the context of the bddgen directly into the agent, thus limiting what kind of code should be generated (Alternatively perhaps this can be achieved as well via an instructions file).
In general I personally feel that LLM's are designed so that they can do everything. However I would prefer to see an specifically trained model which is good at a subset of work that I do, like writing features (based on some ISTQB approaches for example) or creating step definitions or page objects.
Edit: @vitalets I was thinking about your comment and perhaps a VSCode extension which also interfaces with an agent could be nice. Like it would allow for autocomplete pseudocode of an step (which translates directly to playwright-mcp interface) and then offers a button to run it via an agent. Via the prompt an instruction can be included or the extension has an config which can refer to an instruction file.
Edit2: Looking at the flow diagram of copilot https://code.visualstudio.com/api/extension-guides/ai/tools#tool-calling-flow feeding the right information to the copilot toolset might make the most sense. Currently I am letting AI do some changes, run it, in case of failure manually copy paste the fixWithAi in the agent, AI fixes it and then run it again. If this flow could be fully automated then it would be great.
Hi @vitalets , I already did this sometime back with auto-browse where i have given example on how to use playwright-bdd https://typescript.docs.auto-browse.com/usage/bdd-mode I have been hoping to connect with you for sometime, but have been bit busy.
@cloudbring , @Co-Din Take a look and let me know your thoughts - https://typescript.docs.auto-browse.com/usage/bdd-mode It uses playwright-bdd.
Thanks
I came across an article about Cypress’s new cy.prompt() that accepts free-text steps (including BDD-style):
cy.prompt(
[
'Given the user is on the "/register" page',
'When the user enters "Avery Lopez" in the name field',
'And the user enters "[email protected]" in the email field',
'And the user enters {{password}} in the password field',
'And the user checks the terms and conditions checkbox',
'And the user clicks the "Register User" button',
'Then the user should see a success notification'
],
{ placeholders: { password: Cypress.env('PASSWORD') } }
)
What I really like: AI is used once, then the generated step code is cached. The cache invalidates only on failures or step changes. Placeholders let you inject dynamic data into cached steps.
Idea: “Auto-steps mode” for playwright-bdd
You write only feature files, and AI generates the step implementations. Generated steps are cached and re-generated only when needed. The existing .features-gen output could serve as the cache.
Example
Feature:
Feature: Playwright Home Page
Scenario: Check title
Given I am on Playwright home page
When I click link "Get started"
Then I see in title "Installation"
First run: playwright-bdd calls Playwright MCP to generate code per step and writes ./features-gen/homepage.feature.spec.js:
test.describe('Playwright Home Page', () => {
test('Check title', async ({ Given, When, Then, page }) => {
await Given('I am on Playwright home page', async () => {
// generated step start
await page.goto('https://playwright.dev');
// generated step end
});
await When('I click link "Get started"', async () => {
// generated step start
await page.getByRole('link', { name: 'Get started' }).click();
// generated step end
});
await Then('I see in title "Installation"', async () => {
// generated step start
await expect(page).toHaveTitle(/Installation/);
// generated step end
});
});
});
Under the hood, a real browser should be launched to access the accessibility tree and produce reliable locators. This is something Playwright MCP should support.
Subsequent runs: if nothing changed, we just execute the generated code (no AI calls).
Add a new step: only that step is generated and appended.
Then I see in title "Installation"
+ And I see "Install from npm" in the page body
Modify a step: only that step is re-generated.
- When I click link "Get started"
+ When I navigate to installation page
Challenges
- Currently playwright-bdd re-generates all test files on each
bddgencall. It should update only changed regions instead. - To achieve 1 we should reliably bind steps in the feature file to specific generated code blocks so we can invalidate just those blocks.
- Integrate Playwright MCP into the generation phase (
bddgencommand).
Open questions
- Can some steps remain hand-written and be permanently skipped by AI?
- How do we pass custom fixtures into generated steps? If a fixture is renamed or its shape changes, how should regeneration adapt?
Would this be useful in your projects? I’d love feedback on the caching model and the partial regeneration approach.
First run:
playwright-bddcalls Playwright MCP to generate code per step and writes./features-gen/homepage.feature.spec.js:
My experience with Playwright-mcp is that it exposes playwright functionality as tools for an Ai session/code assistant. It can give an Ai instance access to a browser (and tell it to do stuff).
Right now my own process is as such: Generate or take an existing/new test case; Prompt Github Copilot to execute the testcase using playwright-mcp and save traces/aria snapshots (or alternatively use a pipeline with a Ai endpoint instead of Copilot); Prompt Github Copilot to implement the automation based on the project instructions (design pattern).
So I am wondering what you mean with playwright-bdd calling playwright mcp?
Maybe it's possible to have an Playwright-Bdd-Mcp that has an tool which reads specific a bdd config (or a bdd.prompt or a tag) and then stores the relevant playwright code output that playwright-mcp gives. Perhaps it can run the bddgen in a loop and let Ai fill in the steps with the playwright-mcp playwright output code.