Improve ct_master
At RabbitMQ we have started using ct_master to greatly speed up our test runs in CI, moving away from an approach that was caching test results and guessing what tests we had to run again, to an approach that runs all tests but with greater concurrency. (We changed approaches due to circumstances beyond our control, not based on technical merit, but that's a story for another time.)
Greater concurrency here means running multiple test suites at the same time in a single machine (the same that ct_master runs on).
With ct_master we quickly were able to run all test suites in the rabbit application (https://github.com/rabbitmq/rabbitmq-server/tree/main/deps/rabbit/test), and there are a lot, in under 30 minutes on our development machines.
We pushed forward and applied the same principles to CI for our two biggest applications and were able to cut down the run time to around 13 minutes using 4 workers for rabbit, and 1 worker for rabbitmq_mqtt, both using ct_master. Our other applications are smaller and have not needed this treatment applied to them.
I am also interested in making this parallel execution a feature of Erlang.mk at a later time, when the needed functionality is available in OTP directly.
While it is functional, ct_master is in a fairly bad state, and this PR aims to improve that. It includes a fix for https://github.com/erlang/otp/issues/8911 as well as additional functionality. Most of the changes are not controversial, although the last two commits may be:
- ct_master: Return results from ct_master:run: This is a breaking change. But I seriously doubt there's even 1 user of
ct_masteroutside us (and we are using a forked module). - ct_master: Print auto-skipped and failed test cases: This uses the builtin event handler, which wasn't doing much before, so perhaps that's not wanted.
I also noted that the ct_master_status module appears to be completely unused, happy to add a commit to delete it.
The equivalent RabbitMQ PR is at https://github.com/rabbitmq/rabbitmq-server/pull/12502 and one of the comments there links to a few test runs that use ct_master with all changes included in our PR here.
Note that I have not run (or added to!) OTP's CT tests at this point, hoping to get some feedback on the more controversial points, and whether master should be the target or if maint would be OK.
cc @lhoguin
CT Test Results
2 files 58 suites 1h 24m 23s :stopwatch: 451 tests 438 :white_check_mark: 12 :zzz: 1 :x: 487 runs 471 :white_check_mark: 15 :zzz: 1 :x:
For more details on these failures, see this check.
Results for commit fd460277.
:recycle: This comment has been updated with latest results.
To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.
See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.
Artifacts
// Erlang/OTP Github Action Bot
The test failure is because of the breaking change mentioned earlier.