playwright feat(test runner): improve sharding algorithm to better spread similar tests among shards

trafficstars

Adds alternative algorithms to assign test groups to shards to better distribute tests.

Problem

Currently the way sharding works is something like this…

         [  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]
Shard 1:  ^---------^                                      : [  1, 2, 3 ]
Shard 2:              ^---------^                          : [  4, 5, 6 ]
Shard 3:                          ^---------^              : [  7, 8, 9 ]
Shard 4:                                      ^---------^  : [ 10,11,12 ]

Tests are ordered in the way they are discovered, which is mostly alphabetically. This has the effect that test cases are sorted nearby similar tests… for example your have first 6 tests which are testing logged-in state and then 6 tests which test logged-out state. The first 6 tests require more setup time as they are testing logged-in behaviour… With the current sharding algorithm shard 1 & 2 get those slow logged-in tests and shard 3 & 4 get the more quicker tests…

Solution

This PR adds a new shardingMode configuration which allows to specify the sharding algorithm to be used…

`shardingMode: 'partition'`

That's the current behaviour, which is the default. Let me know if you have a better name to describe the current algorithm...

`shardingMode: 'round-robin'`

Distribute the test groups more evenly. It…

sorts test groups by number of tests in descending order
then loops through the test groups and assigns them to the shard with the lowest number of tests.

Here is a simple example where every test group represents a single test (e.g. --fully-parallel) ...

         [  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]
Shard 1:    ^               ^               ^              : [  1, 5, 9 ]
Shard 2:        ^               ^               ^          : [  2, 6,10 ]
Shard 3:            ^               ^               ^      : [  3, 7,11 ]
Shard 4:                ^               ^               ^  : [  4, 8,12 ]

…or a more complex scenario where test groups have different number of tests…

Original Order: [ [1], [2, 3], [4, 5, 6], [7], [8], [9, 10], [11], [12] ]
Sorted Order:   [ [4, 5, 6], [2, 3], [9, 10], [1], [7], [8], [11], [12] ]
Shard 1:           ^-----^                                                : [ [ 4,   5,   6] ]
Shard 2:                      ^--^                       ^                : [ [ 2,  3],  [8] ]
Shard 3:                              ^---^                    ^          : [ [ 9, 10], [11] ]
Shard 4:                                       ^    ^                ^    : [ [1], [7], [12] ]

`shardingMode: 'duration-round-robin'`

It's very similar to round-robin, but it uses the duration of a tests previous run as cost factor. The duration will be read from .last-run.json when available. When a test can not be found in .last-run.json it will use the average duration of available tests. When no last run info is available, the behaviour would be identical to round-robin.

Other changes

Add testDurations?: { [testId: string]: number } to .last-run.json
Add builtin lastrun reporter, which allows merge-reports to generate a .last-run.json to be generated

Appendix

Below are some runtime stats from a project I've been working on, which shows the potential benefit of this change.

The tests runs had to complete 161 tests. Single test duration ranges from a few seconds to over 2 minutes.

The partition run gives the baseline performance and illustrates the problem quite good. We have a single shard that takes almost 16 min while another one completes in under 5 min.

The round-robin algorithm gives a bit better performance, but it still has a shard that requires twice the time of another shard.

The duration-round-robin run was using the duration info from a previous run and achieves the best result by far. All shards complete in 10-11 minutes. 🏆 🎉

May 22 '24 15:05 muhqu

Maybe it's better to make this an option to allow restoring the old behaviour. ¯_(ツ)_/¯

~And… there should be unit-tests, no?~ found them…

May 22 '24 15:05 muhqu

Test results for "tests 1"

37 failed :x: [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable :x: [playwright-test] › runner.spec.ts:118:5 › should ignore subprocess creation error because of SIGINT :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel :x: [playwright-test] › shard.spec.ts:66:5 › should respect shard=1/2 :x: [playwright-test] › shard.spec.ts:80:5 › should respect shard=2/2 :x: [playwright-test] › shard.spec.ts:107:5 › should respect shard=2/3 :x: [playwright-test] › shard.spec.ts:119:5 › should respect shard=3/3 :x: [playwright-test] › shard.spec.ts:131:5 › should respect shard=3/4 :x: [playwright-test] › shard.spec.ts:151:5 › should respect shard=1/2 in config :x: [playwright-test] › shard.spec.ts:170:5 › should work with workers=1 and --fully-parallel

1 flaky

:warning: [firefox-page] › page/page-request-continue.spec.ts:481:3 › continue should not change multipart/form-data body

26993 passed, 610 skipped :heavy_check_mark::heavy_check_mark::heavy_check_mark:

playwright playwright copied to clipboard

feat(test runner): improve sharding algorithm to better spread similar tests among shards

Problem

Solution

shardingMode: 'partition'

shardingMode: 'round-robin'

shardingMode: 'duration-round-robin'

Other changes

Appendix

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

Test results for "tests 1"

playwright
playwright copied to clipboard

`shardingMode: 'partition'`

`shardingMode: 'round-robin'`

`shardingMode: 'duration-round-robin'`