Playwright has some great default behavior around sharding tests across multiple workers. It would be super helpful if the shard can take in a --timing-data file similar to CircleCI's split tests logic so we can let Playwright internally split tests across the timing boundaries. This would allow Playwright to complete as fast as possible by splitting tests more optimally across multiple machines or workers.

Oct 10 '22 19:10 ffluk3

This is the one thing that has made me hesitate switching from Cypress to Playwright despite the many advantages.

Cypress Cloud solves this with their Smart Orchestration (specifically, the "Load Balancing" strategy).

Nov 30 '22 17:11 michaelhays

This is one of the major limitation of playwright.

Apr 13 '23 14:04 kp-abhishek-agrawal

I would love to see this implemented in Playwright. It is a fairly common feature among other test platforms and tooling.

Nov 27 '23 22:11 PsiKai

I'd also love to see this feature. I thought of creating a PR with the following additions:

Add '--timing-file' flag, e.g: --timing-file="report.json"
The file will have the duration and id for each test
Based on the duration we can build sets of as close as possible duration, then assign the test groups based on the test id.

So far I tried to use the json reporter and extract this data from it which seems to work, but it's a bit messy because of the 'infinite' amount of suites that you need to go over. Not sure if I'd want to generate another smaller timing file from the json reporter, or just use the json report file and extract the data for it in the playwright runner. Also thought of creating a completely new reporter which will only include the test name, test id and duration.

Based on my testing so far, we are able to create a pretty good balance for any amount of shards. I believe more details are required, but please let me know whether this is something that could be implemented, otherwise I'd just create a custom external solution instead.

Thanks.

Apr 04 '24 14:04 ofirpardo-artlist

A small scale example, current implementation:

New balance based on json report:

In most cases since the shards are running in parallel, what matters is when the last shard finishes running. Having 1 shard at 35 seconds and 1 shard at 1.5 minutes is as good as having 2 shards running for 1.5 minutes.

How the new shard balancing works:

You point to a json report file
From the report we extract the duration for each test based on ID
Based on the amount of total shards we map the tests by duration
Test groups are created based on the new mapping
If there are new tests that are not recorded in the report they will be split evenly based on current implementation

This currently works for any amount of shards.

Apr 08 '24 13:04 ofirpardo-artlist

We have implemented something similar for Currents https://docs.currents.dev/guides/pw-parallelization/playwright-orchestration

Jun 05 '24 18:06 agoldis

@ofirpardo-artlist how did you end up doing this? We would great benefit from this.

Jul 27 '24 21:07 sfrique

@ofirpardo-artlist how did you end up doing this? We would great benefit from this.

@sfrique Since playwright just closed my PR without any actual explanation, I just use patch-package(https://www.npmjs.com/package/patch-package) to make the changes on playwright side: https://pastebin.com/vTy3k0uU

I can just upload the patch file if it will be easier to read perhaps.

I've created a custom reporter: https://pastebin.com/HpFxQKmZ

This gives me a json file for each passed test with the following: test id, test name, and test duration (technically this can be changed to include failed/flaky tests, but I don't think it's a good approach)

Then you can just point the timing file to that report, e.g: npx playwright test --timing-file=/results.json

Personally what I do is also merge multiple reports for a better average every few CI runs so it's always updating itself to be accurate: https://pastebin.com/6xe0uHbk

And I also use multiple shards so my merge.config.ts file looks like this:

export default {
  reporter: [
    ['./custom-reporter.ts', { outputFile: 'report.json', outputPath: 'e2e/blob-report' }],
  ],
};

Feel free to ask any questions.

Aug 26 '24 08:08 ofirpardo-artlist

@ofirpardo-artlist I've also created multiple PRs which try to improve sharding over the past couple of month, but all of them were either closed or reverted after being merged.

https://github.com/microsoft/playwright/pull/30962
https://github.com/microsoft/playwright/pull/33049
https://github.com/microsoft/playwright/pull/30817
https://github.com/microsoft/playwright/pull/31260

let me collect the responses given here… so, we can have a discussion

@pavelfeldman said: We've discussed it at length during the team meeting. The consensus was that we don't see a low-maintenance solution to the problem that would cover a meaningful number of use cases. Even the mature solutions that track execution timing and reuse this information in subsequent runs are brittle and have tendency to rot. We would go for a lower-level solution that would allow user to take over the scheduling (test list files or api calls), but we don't want to see a new extensive api surface or file formats there. I'm following up for transparency - we are thinking about the problem, but we don't see a good maintainable solution atm.

@dgozman said: We would really like a low level API to also be able to control the order of test execution. For example, one could imagine a strategy that would "run last failures first, and only if all of them pass - run all other tests". Ideally, we want strategies like this to be implementable on top of the proposed API.

@pavelfeldman said: We are looking for the ways to cover more use cases (with hooks) with a smaller api surface / maintenance cost. For example, allow test lists for shards so that users could use third party solutions to tune their exact configuration. Or allow a callback that would take over scheduling tests. Developing those to be easy in maintenance requires consideration and time that we can't currently allocate to the problem. But we are very open to keeping this communication in case a nice proposal comes up.

@pavelfeldman said: We know that people want test lists, but we are struggling with committing to a persistent test id that would be used in those (it becomes a part of our contract). Lists are also sub-optimal for those interested in sharding as shards have different lists. But many more customers are interested in custom failure retries where test lists are very useful, so it might be that committing it worth it. […] I like having lower-level primitives that allow for greater flexibility for power users to tune Playwright to their definition of perfection. Much more than having a handful of suboptimal presets that will only work for a couple of customers.

@dgozman @pavelfeldman I have to admit that I had not seen @ofirpardo-artlist 's https://github.com/microsoft/playwright/pull/30388 when I started working on my own solution. I really like that PR for it's simplicity…

https://github.com/microsoft/playwright/pull/30388

Let's continue the discussion with the goal of splitting shards based on timing data.

Oct 18 '24 07:10 muhqu

There is a new proposal which could be helpful to allow users to implement their own sharding logic…

https://github.com/microsoft/playwright/issues/33386

Nov 07 '24 22:11 muhqu

Really hope you'll manage to get it through, I personally gave up on it since it doesn't seem like they're really interested in that feature, and I really like my own solution using patch-package and a custom reporter. Makes everything very easy to upload to S3 and reuse it from S3. It's now even simpler than the original PR that I created which takes the whole json reporter and parses it. Hope to see something merged soon 👍

Nov 07 '24 23:11 ofirpardo-artlist

After reading a lot of the comments in this repository around randomness and sharding, there are two solutions that people are asking for:

Randomness to ensure tests are ACID compliant.
Shard balancing based on test timing.

Both are real needs, but they are very distinct. Shard balancing based on test timing is, in fact, the opposite of randomness. Underneath, however, they would both rely on taking the test files and turning it into a nested array of which tests run in a shard. If an interface can be agreed on, the randomness option can be a quick solution to prove out the interface for balancing.

Nov 19 '24 02:11 aaron-humerickhouse

It feels as though randomness could be achieved in a much simpler way though—e.g. by having a command line argument to specify a seed value?

Nov 19 '24 10:11 marshmn

@marshmn like this PR which was merged and later reverted?

Nov 19 '24 11:11 muhqu

Hi, I’m among those hoping this feature will eventually be integrated into Playwright. I’ve seen a lot of activity and PRs around this, and I want to express my respect for all the hard work that’s gone into it.

On my end, I try to make a tool that implements this feature without requiring any modifications to the Playwright core. It also supports Vitest and Jest, not just Playwright. repo: https://github.com/nissy-dev/tenbin

So far, I’ve only tested it with some simple cases, but it’s been working well. Please feel free to try it out if you're interested!

Dec 01 '24 02:12 nissy-dev

You might want to comment or up-vote comments on what Playwright should concentrate on in 2025:

https://github.com/microsoft/playwright/issues/33955#issuecomment-2562679995

Dec 27 '24 10:12 muhqu

Hey all, I wanted to update this thread with an option that doesn't embed this functionality into Playwright itself, which we use today to provide the functionality within our projects. The general workflow is:

Remove any usage of Playwright's internal sharding, in favor of the CI provider's parallelism strategy
Ensure all Playwright executions export into a standard format (i.e. JUnit XML)
Store the reports for each run, particularly against the main branch.
invoke the CI provider's unique tool

While I understand the trade-off here that moving between CI providers is less trivial, this gives all the functionality you're looking for now without needing to wait for Playwright to provide this support internally. Even if sharding added better timing support, we already would need to leverage the CI provider's support for storage of test results to re-gather this data in most cases. Plus, as mentioned in https://github.com/microsoft/playwright/issues/17969#issuecomment-2309619576, any given team might want to apply their own heuristic associated with the test results, such as using the most recent 5 results to smooth out timing data.

I would also note that, for any shops that have multiple languages, consolidating to a single report format (and tool) for test reporting is very advantageous.

Example: CircleCI

For CircleCI, you can make use of the circleci tests glob and circleci tests run

  playwright:
    executor: base-executor
    resource_class: medium+
    working_directory: app
    parallelism: 4
    steps:
      - run:
          name: Run playwright against << parameters.environment >>
          command: |
            PLAYWRIGHT_COMMAND="yarn e2e:preprod ..."

            TESTFILES=$(circleci tests glob "playwright/tests/**/*.test.ts")

            echo $TESTFILES | circleci tests run --command="xargs $PLAYWRIGHT_COMMAND" --verbose --split-by=timings
      - store_test_results:
          path: app/reports

NOTE: our Playwright configuration explicitly enables junit reporting to a fixed output location

Example: GitHub actions

This example action (shoutout @r7kamura) makes use of the split-test CLI to always be able to Glob a set of test files and cross-reference JUnit XML data outward. I'm sure there are a myriad of ways to replicate the same functionality.

Jan 02 '25 21:01 ffluk3

@ffluk3 thanks for sharing your use-case which is really helpful to workaround the current limitations of playwright test runner.

Am I correct to assume that this implies splitting of tests is performed per testfile instead of test case?

Playwrights configuration allows for multiple projects (e.g. different browser versions, locales, test user accounts, etc) which can have different globs for test files and of course test files can have varying number of tests. With current playwright sharding we can say 1/4 of all test cases should run on a given shard and it does that across all project configurations and discovered tests… It does look like your loosing that to some extend when using your proposed workaround. WDYT?

Jan 07 '25 11:01 muhqu

@muhqu Yes that is totally fair - the approach we take imposes some limitations depending on the structure used, and for our use cases we stick to a single browser. I would argue that one could structure a set of projects to also run as different jobs and each use the test splitting, if that option were available, but that further removes the ability to use playwright's built-in orchestration tooling. Thus far for us, we have not seen issue with having our own orchestration on top of running single-project configurations per job, but I could see the use case.

Jan 07 '25 16:01 ffluk3

@ffluk3 but I think what playwright could do, to work even better the way you’re using it, would be to improve the playwright test -list command to produce output which can later be consumed by playwright test. Then you would not need to split based on files and could instead split based on actual test cases. 🤔

Jan 07 '25 19:01 muhqu

Playwright does have a --list flag, I'll see if I can fit it better into my approach.

Jan 07 '25 22:01 ffluk3

Here are a few options people might find helpful. Run e2e tests split by timing (per testcase) is the one really relevant for here. However, in this example the test case is identified using the line number in the test file, which is not ideal. This approach means that if you modify your tests, the line numbers will change, causing previous timing data to be lost. Additionally, tests across different browsers won’t be split. Each test will run for all configured browsers on the same machine, rather than being distributed separately.

CircleCI Configuration

      - run:
          name: Run e2e tests split by timing (per testcase)
          command: npx playwright test --list | awk -F' › ' '{print $2}' | sort -u | sed 's/:[0-9]\+$//' | circleci tests run --command="xargs npx playwright test" --verbose --split-by=timings
      - run:
          name: Run e2e tests split by timing (per test file)
          command: circleci tests glob "playwright/**/*.spec.ts" | circleci tests run --command="xargs npx playwright test" --verbose --split-by=timings
      - run:
          name: Run e2e tests split by shards (per testcase)
          command: SHARD="$((${CIRCLE_NODE_INDEX}+1))"; npx playwright test --shard=${SHARD}/${CIRCLE_NODE_TOTAL}

Update

The above Run e2e tests split by timing (per testcase) doesn't actually work. Because circle ci matches the input to the circleci test run command with the attributes in the junit report. And here, you'll not find the line numbers.

Here's what we ended up doing.

List the playwright tests with npx playwright test --list
Extract the test-names (Everything after the second ›)
Write these into a file (we had problems to pass them to circleci test split, because of the spaces. An intermediate file worked for us)
Split the file via circleci tests split --split-by=timings --timings-type=testname testnames.txt
List the testcases again via npx playwright test --list and match the names from splitting to the filename + linenumber and pass this to npx playwright test

Here is an example with processing in node:

      - run:
          name: Prepare temporary list of all playwright tests
          command: node tools/playwright/list-test-names.js | tee testnames.txt
      - run:
          name: Run e2e tests split by timing (per testcase)
          command: circleci tests split --split-by=timings --timings-type=testname testnames.txt | node tools/playwright/extract-testcase-lines-by-test-names.js | xargs npx playwright test

list-test-names.js

const { execSync } = require('child_process');

const rawTestList = execSync('npx playwright test --list', { encoding: 'utf8' });

const testList = rawTestList
  .split('\n')
  .filter((line) => line.includes('›'))
  .map((line) => line.split(' › ').slice(2).join(' › '));
const finalList = testList.join('\n');
console.log(finalList);

extract-testcase-lines-by-test-names.js

const listOfAllTests = require('child_process')
  .execSync('npx playwright test --list', { encoding: 'utf8' })
  .split('\n')
  .filter((line) => line.includes('›'))
  .map((line) => line.trim());

// read full input stream
const stdinBuffer = require('fs').readFileSync(0).toString().trim();

stdinBuffer.split('\n').forEach((testCaseLine) => {
  // check if the regex matches any of the testcases
  const relevantTestcases = listOfAllTests.filter((test) => test.includes(testCaseLine));
  if (relevantTestcases.length === 0) {
    throw new Error(`No test found for: ${testCaseLine}`);
  }
  if (relevantTestcases.length > 1) {
    console.log(relevantTestcases);
    throw new Error(`Multiple tests found for: ${testCaseLine}... Choose a different name`);
  }
  const relevantTestcase = relevantTestcases[0];
  const regex = /\s(\S+:\d+:\d+)/;
  // extract the file and line number from the first match
  const testFileWithLineFromFirstMatchgroup = relevantTestcase.match(regex)[1];
  console.log(testFileWithLineFromFirstMatchgroup);
});

I hope this workaround helps someone.

Jan 30 '25 10:01 Schoko19961

@dgozman any chance to get any further with this?

It's been over a year that I created a PR which supports duration-round-robin sharding and several teams at our company (adobe) are using this via patch-package.

https://github.com/microsoft/playwright/pull/30962

Just now, I created an updated patch for playwright 1.54.1 as your users asked for it…

Jul 26 '25 16:07 muhqu

@dgozman Sad that the --filter implementation has been rolled-back. Is there any existing alternative or why has it been removed?

Sep 22 '25 09:09 sizzle168

@sizzle168 We've decided to replace it with --last-run-file option instead. See #37209. You can specify filterTests property there and the filter will be applied.

We assume this should be easy enough to generate with a script, by having a custom reporter and running npx playwright test --list --reporter=./myreporter to generate a list of test ids according to any criteria you'd like.

It would be great if you could give it a try by installing a canary release and share your feedback before the next stable release. Thank you!

Sep 22 '25 10:09 dgozman

I am finding this a major obstacle with Playwright as well. I have seen shards unbalanced by as much as 10x, and am having to do a lot of custom work to rebalance them. I know the playwright team wants to make this pluggable so people can define their own algorithms, but in the meantime Playwright seriously underperforms. I tried adapting @muhqu's patch to 1.55.0 but ran into errors on running merge-reports, so am now forced to consider a downgrade to 1.54.1.

Sep 25 '25 14:09 gpaciga

I have seen this recently but haven't tried it: https://docs.currents.dev/guides/ci-optimization/playwright-parallelization#playwright-orchestration. Paid service though.

Sep 25 '25 16:09 segevfiner

@dgozman as I understand it, to implement timings-based shard balancing, we essentially have to script the balancing ourselves by filling in different tests in a last-run file for each shard, and not using built-in sharding

Does this have any implications regarding report merging? I'm using playwright in Github Actions and I upload the merged report to github pages, which is really handy. will this still work if I switch to --last-run-file-based sharding instead of --shard=n/m ?

if yes, I think i'll package up a time balancing implementation so thats its easy to just npm-install and we dont have to all write the same script :)

Sep 25 '25 17:09 gwennlbh

The solutions mentioned above for CircleCI do not work correctly (other than patching Playwright).

This is because Circleci depends on the filename attribute in the JUnit output, but for some reason Playwright seem to think that this entirely standard field is actually a "CI provider specific" thing and won't support it in the their built in reporter.

I think actually Cypress has a similar limitation which we fixed using this package https://github.com/ksocha/cypress-circleci-reporter

I've ported that reporter to Playwright here: https://npmjs.com/settings/alexstapleton/packages and that does seem to actually work to get the binpacking to work with the CircleCI tools now. I have to run it like this to get the paths to line up but YMMV

TESTFILES=$(circleci tests glob "packages/main/tests/e2e/**/*.spec.ts" | sed "s|^|$(pwd)/|" | circleci tests split --split-by=timings | sed "s|^$(pwd)/packages/main/||")
NODE_ENV=test pnpm --dir packages/main run playwright:run --max-failures 5 $TESTFILES

Nov 10 '25 09:11 public

@gpaciga, I've created a new patch for playwright 1.56.1 with the changes from:

https://github.com/microsoft/playwright/pull/30962

Complete diff: https://github.com/microsoft/playwright/compare/v1.56.1...muhqu:playwright:sharding-algorithm-v1.56.1

Nov 13 '25 16:11 muhqu

[Feature] Split shards via test timing data

Example: CircleCI

Example: GitHub actions

CircleCI Configuration

Update

list-test-names.js

extract-testcase-lines-by-test-names.js