playwright
playwright copied to clipboard
feat(test runner): improve sharding algorithm to better spread similar tests among shards
Adds alternative algorithms to assign test groups to shards to better distribute tests.
Problem
Currently the way sharding works is something like this…
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Shard 1: ^---------^ : [ 1, 2, 3 ]
Shard 2: ^---------^ : [ 4, 5, 6 ]
Shard 3: ^---------^ : [ 7, 8, 9 ]
Shard 4: ^---------^ : [ 10,11,12 ]
Tests are ordered in the way they are discovered, which is mostly alphabetically. This has the effect that test cases are sorted nearby similar tests… for example your have first 6 tests which are testing logged-in state and then 6 tests which test logged-out state. The first 6 tests require more setup time as they are testing logged-in behaviour… With the current sharding algorithm shard 1 & 2 get those slow logged-in tests and shard 3 & 4 get the more quicker tests…
Solution
This PR adds a new shardingMode configuration which allows to specify the sharding algorithm to be used…
shardingMode: 'partition'
That's the current behaviour, which is the default. Let me know if you have a better name to describe the current algorithm...
shardingMode: 'round-robin'
Distribute the test groups more evenly. It…
- sorts test groups by number of tests in descending order
- then loops through the test groups and assigns them to the shard with the lowest number of tests.
Here is a simple example where every test group represents a single test (e.g. --fully-parallel) ...
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Shard 1: ^ ^ ^ : [ 1, 5, 9 ]
Shard 2: ^ ^ ^ : [ 2, 6,10 ]
Shard 3: ^ ^ ^ : [ 3, 7,11 ]
Shard 4: ^ ^ ^ : [ 4, 8,12 ]
…or a more complex scenario where test groups have different number of tests…
Original Order: [ [1], [2, 3], [4, 5, 6], [7], [8], [9, 10], [11], [12] ]
Sorted Order: [ [4, 5, 6], [2, 3], [9, 10], [1], [7], [8], [11], [12] ]
Shard 1: ^-----^ : [ [ 4, 5, 6] ]
Shard 2: ^--^ ^ : [ [ 2, 3], [8] ]
Shard 3: ^---^ ^ : [ [ 9, 10], [11] ]
Shard 4: ^ ^ ^ : [ [1], [7], [12] ]
shardingMode: 'duration-round-robin'
It's very similar to round-robin, but it uses the duration of a tests previous run as cost factor. The duration will be read from .last-run.json when available. When a test can not be found in .last-run.json it will use the average duration of available tests. When no last run info is available, the behaviour would be identical to round-robin.
Other changes
- Add
testDurations?: { [testId: string]: number }to.last-run.json - Add builtin
lastrunreporter, which allowsmerge-reportsto generate a.last-run.jsonto be generated
Appendix
Below are some runtime stats from a project I've been working on, which shows the potential benefit of this change.
The tests runs had to complete 161 tests. Single test duration ranges from a few seconds to over 2 minutes.
The partition run gives the baseline performance and illustrates the problem quite good. We have a single shard that takes almost 16 min while another one completes in under 5 min.
The round-robin algorithm gives a bit better performance, but it still has a shard that requires twice the time of another shard.
The duration-round-robin run was using the duration info from a previous run and achieves the best result by far. All shards complete in 10-11 minutes. 🏆 🎉
Maybe it's better to make this an option to allow restoring the old behaviour. ¯_(ツ)_/¯
~And… there should be unit-tests, no?~ found them…
Test results for "tests 1"
37 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable :x: [playwright-test] › runner.spec.ts:118:5 › should ignore subprocess creation error because of SIGINT :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel
1 flaky
:warning: [firefox-page] › page/page-request-continue.spec.ts:481:3 › continue should not change multipart/form-data body26993 passed, 610 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
13 failed :x: [playwright-test] › reporter-json.spec.ts:45:5 › should not report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › created › should report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › merged › should report skipped due to sharding :x: [playwright-test] › reporter-json.spec.ts:45:5 › should not report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › created › should report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › merged › should report skipped due to sharding :x: [playwright-test] › reporter-json.spec.ts:45:5 › should not report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › created › should report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › merged › should report skipped due to sharding :x: [playwright-test] › reporter-json.spec.ts:45:5 › should not report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › created › should report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › merged › should report skipped due to sharding :x: [playwright-test] › reporter-json.spec.ts:45:5 › should not report skipped due to sharding
1 flaky
:warning: [webkit-page] › page/workers.spec.ts:243:3 › should support offline23160 passed, 434 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
2 failed :x: [playwright-test] › reporter-junit.spec.ts:193:9 › created › should report skipped due to sharding :x: [playwright-test] › reporter-junit.spec.ts:193:9 › merged › should report skipped due to sharding
17194 passed, 290 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
1 failed :x: [installation tests] › playwright-component-testing.spec.ts:55:5 › pnpm: JSX inside a @playwright/test should work
2 flaky
:warning: [firefox-page] › page/page-request-continue.spec.ts:481:3 › continue should not change multipart/form-data body:warning: [playwright-test] › ui-mode-test-progress.spec.ts:165:5 › should update tracing network live
27053 passed, 610 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
1 failed :x: [firefox-page] › page/page-request-continue.spec.ts:481:3 › continue should not change multipart/form-data body
1 flaky
:warning: [webkit-page] › page/page-goto.spec.ts:277:3 › should fail when navigating to bad SSL after redirects27054 passed, 610 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Do you think you can achieve the same better behavior with your sharding seed? Or are you looking for additional bias against subsequent tests being put into the same group?
Do you think you can achieve the same better behavior with your sharding seed?
Not sure yet. But I will test this new sharding logic in our test setup to gather some results.
Or are you looking for additional bias against subsequent tests being put into the same group?
The seeded shuffle is basically just a quick and easy way to influence the test group to shard assignment… it's random and so it's results may vary.
However, this change is aimed to improve the sharding logic to generally yield better results, which yet needs to be proved. 😅
Currently this sharding algorithm uses the number of tests per test group as a cost metric. It would be great if we could use the test duration of a previous run (when available) to even better distribute the tests among the shards. But the algorithm would be quite similar.
I think your seed change allows users to experiment with the seeds and arrive at some better state than they are today. Any other changes w/o the timing feedback are going to yield similar results, not need to experiment with biases.
It would be great if we could use the test duration of a previous run (when available) to even better distribute the tests among the shards.
This requires a feedback loop with the test time stats which we don't have today. We recently started storing the last run stats in .last-run.json, I think it is Ok to store the test times there and to use it on the subsequent runs for better sharding, if it is available. Would you be interested in working on it?
Yes, I would like to work on that.
I was not yet aware of the .last-run.json. Is that something that is also written by the merge reports command? Because we need the stats combined from all shard runs.
I was thinkings about adding a separate reporter for that purpose, but if those last run stats are already there…, then there might not be the need to create a separate reporter.
I was thinkings about adding a separate reporter for that purpose, but if those last run stats are already there…, then there might not be the need to create a separate reporter.
Shaping this code as reporter sounds good, but Playwright core would need to consume the output of that reporter, so it needs to be baked in. Merging those should not be a hard problem, reporter or not. Unfortunately merging mangles test ids today, so we'd need to figure that out. Maybe not using the ids altogether and falling back to the file names and test titles. Also has some tricky edge cases as tests that are fast on Chromium and are slow on Firefox...
Test results for "tests 1"
2 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable :x: [playwright-test] › esm.spec.ts:530:5 › should resolve .js import to .tsx file in ESM mode for components
1 flaky
:warning: [webkit-library] › library/browsercontext-clearcookies.spec.ts:92:3 › should remove cookies by domain27099 passed, 609 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
I've added a lastrun reporter that can be used with merge-reports to generate .last-run.json.
Surprisingly when merging the reports the test ids just had a 1 character suffix that I was able to strip off… but it doesn't feel like the right way to do this.
What's the reason to modify test ids when merging blobs? Couldn't this be done in a way that only modifies a test id when there is a collision?
Test results for "tests 1"
1 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable
3 flaky
:warning: [chromium-library] › library/trace-viewer.spec.ts:240:1 › should have network requests:warning: [chromium-library] › library/browsertype-connect.spec.ts:320:5 › run-server › should set the browser connected state
:warning: [playwright-test] › ui-mode-test-screencast.spec.ts:21:5 › should show screenshots
27159 passed, 609 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
1 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable
2 flaky
:warning: [chromium] › components/splitView.spec.tsx:34:5 › should render sidebar first:warning: [webkit-page] › page/workers.spec.ts:243:3 › should support offline
27160 passed, 609 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
@pavelfeldman last-run.json is a little bit tricky to work with atm… it constantly gets overwritten even if you just list tests and VS Code seems to do that from time to time to refresh the tests panel... I think it should only get written when it's actually running tests.
I think it should only get written when it's actually running tests.
Agreed. Note that for the PGO-alike behavior, we would probably explicitly point to the file and commit it to the repo. Would be the same format as last-run, but user would copy it over into some playwright/ folder and point to it explicitly.
Test results for "tests 1"
1 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable
27178 passed, 609 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
3 flaky
:warning: [chromium-library] › library/trace-viewer.spec.ts:240:1 › should have network requests:warning: [firefox-page] › page/frame-goto.spec.ts:46:3 › should continue after client redirect
:warning: [firefox-page] › page/page-event-request.spec.ts:171:3 › should return response body when Cross-Origin-Opener-Policy is set
27176 passed, 609 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
1 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable
3 flaky
:warning: [chromium] › components/splitView.spec.tsx:34:5 › should render sidebar first:warning: [webkit-library] › library/browsercontext-clearcookies.spec.ts:72:3 › should remove cookies by name regex
:warning: [playwright-test] › ui-mode-test-watch.spec.ts:145:5 › should watch all
27191 passed, 608 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
15 passed :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
27195 passed, 608 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
@pavelfeldman […] we would probably explicitly point to the file […]
Do you have a recommendation how you would name cli parameter / configuration option?
Something like --sharding-read-last-run-info path/to/merged-last-run.json is probably too much of a mouth full? However, I think we should somehow make it obvious when data is read or written to a file…
Test results for "tests 1"
27501 passed, 608 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
This is awesome - we were planning on implementing something similar outside playwright before we migrate cypress tests that currently shard across 50 build servers.
Is there anything todo here I could help with to get this merged quicker ?
Hey, I did not noticed this PR got conflicts due to the revert of my other change… I’ll resolve those conflicts in the next days. There was still on open question on whether we should introduce a command line flag to specify a last run state file or not… but I think that are improvements that can also happen after this has been merged
Though, this PR could have some more documentation changes… anyone want to help with that?
Test results for "tests 1"
15 failed :x: [chromium] › components/codeMirrorWrapper.spec.tsx:72:5 › highlight JavaScript :x: [chromium] › components/codeMirrorWrapper.spec.tsx:77:5 › highlight Python :x: [chromium] › components/codeMirrorWrapper.spec.tsx:82:5 › highlight Java :x: [chromium] › components/codeMirrorWrapper.spec.tsx:87:5 › highlight C# :x: [chromium] › components/codeMirrorWrapper.spec.tsx:92:5 › highlight lines :x: [chromium] › components/expandable.spec.tsx:23:5 › should render collapsed :x: [chromium] › components/expandable.spec.tsx:30:5 › should render expanded :x: [chromium] › components/expandable.spec.tsx:37:5 › click should expand :x: [chromium] › components/splitView.spec.tsx:23:5 › should render :x: [chromium] › components/splitView.spec.tsx:34:5 › should render sidebar first :x: [chromium] › components/splitView.spec.tsx:45:5 › should render horizontal split :x: [chromium] › components/splitView.spec.tsx:56:5 › should hide sidebar :x: [chromium] › components/splitView.spec.tsx:65:5 › drag resize :x: [chromium] › shared/imageDiffView.spec.tsx:38:5 › should render links :x: [chromium] › shared/imageDiffView.spec.tsx:47:5 › should show diff by default
0 passed :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
1 failed :x: [playwright-test] › reporter-html.spec.ts:389:5 › created › should use different path if attachments base url option is provided
2 flaky
:warning: [chromium-library] › library/capabilities.spec.ts:137:3 › should not crash on showDirectoryPicker:warning: [playwright-test] › ui-mode-test-watch.spec.ts:145:5 › should watch all
28476 passed, 649 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
Test results for "tests 1"
7 flaky
:warning: [chromium-library] › library/browsercontext-fetch-happy-eyeballs.spec.ts:49:3 › get should work on request fixture:warning: [chromium-library] › library/browsercontext-fetch-happy-eyeballs.spec.ts:49:3 › get should work on request fixture
:warning: [chromium-library] › library/browsercontext-fetch-happy-eyeballs.spec.ts:49:3 › get should work on request fixture
:warning: [firefox-library] › library/browsercontext-fetch-happy-eyeballs.spec.ts:49:3 › get should work on request fixture
:warning: [chromium-library] › library/browsercontext-fetch-happy-eyeballs.spec.ts:49:3 › get should work on request fixture
:warning: [playwright-test] › ui-mode-test-screencast.spec.ts:21:5 › should show screenshots
:warning: [webkit-library] › library/browsercontext-fetch-happy-eyeballs.spec.ts:49:3 › get should work on request fixture
28487 passed, 649 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:
Merge workflow run.
@dgozman I've just added --last-run-file= cli flag and CI seems to be happy… any further work needed on this PR to get merged?