Polykey
Polykey copied to clipboard
Apply bin packing algorithms to the setup of our parallel CI/CD test runners
Specification
We want to introduce test load balancing to better parallelise and speed up our CI/CD tests. Part of this task is determining the most efficient number of shards to use, since we want to have the lowest number of parallel runners with the highest rate test completion. Such a task falls under the scope of a bin packing problem, which is NP-hard, however there are approximations we can use to make the problem easier.
On top of simply choosing the best number of shards to use, we want the shards themselves to be evenly balanced so that every test runner finishes at approximately the same time. We can use cached timing information from previous test runs to help to make this decision.
Additional context
- https://github.com/MatrixAI/TypeScript-Demo-Lib/pull/65#issuecomment-1168078968 - Summary of Jest's default sharding/sorting algorithm, along with comparisons with existing solutions
- https://github.com/kamilkisiela/split-tests - Very simple algorithm for keeping shards approximately the same size (no calculation for the most efficient number of shards though)
- https://github.com/MatrixAI/js-polykey/issues/392 - Issue for introducing test load balancing to PK
Tasks
- ...
- ...
- ...
To do this, a custom test sequencer that overrides the shard
method would need to be used. This experimented with prior in https://github.com/MatrixAI/TypeScript-Demo-Lib/pull/65
Doesn't seem relevant anymore, testing is pretty quick now by spreading the workload between Polykey and Polykey-CLI.
The sharding system is still being used (even in the migration to GitHub in https://github.com/MatrixAI/Orchestrator/issues/10), as can be seen by:
scripts/build-platforms-generate.sh
10:# Using shards to optimise tests
14:# Number of parallel shards to split the test suite into
136: - npm test -- --ci --coverage --shard="$CI_NODE_INDEX/$CI_NODE_TOTAL" --maxWorkers=50%
164: - npm test -- --ci --coverage --shard="$CI_NODE_INDEX/$CI_NODE_TOTAL" --maxWorkers=50%
However it's just relying on whatever jest's default sharing allocation algorithm is.