otp Improve ct

At RabbitMQ we have started using ct_master to greatly speed up our test runs in CI, moving away from an approach that was caching test results and guessing what tests we had to run again, to an approach that runs all tests but with greater concurrency. (We changed approaches due to circumstances beyond our control, not based on technical merit, but that's a story for another time.)

Greater concurrency here means running multiple test suites at the same time in a single machine (the same that ct_master runs on).

With ct_master we quickly were able to run all test suites in the rabbit application (https://github.com/rabbitmq/rabbitmq-server/tree/main/deps/rabbit/test), and there are a lot, in under 30 minutes on our development machines.

We pushed forward and applied the same principles to CI for our two biggest applications and were able to cut down the run time to around 13 minutes using 4 workers for rabbit, and 1 worker for rabbitmq_mqtt, both using ct_master. Our other applications are smaller and have not needed this treatment applied to them.

I am also interested in making this parallel execution a feature of Erlang.mk at a later time, when the needed functionality is available in OTP directly.

While it is functional, ct_master is in a fairly bad state, and this PR aims to improve that. It includes a fix for https://github.com/erlang/otp/issues/8911 as well as additional functionality. Most of the changes are not controversial, although the last two commits may be:

ct_master: Return results from ct_master:run: This is a breaking change. But I seriously doubt there's even 1 user of ct_master outside us (and we are using a forked module).
ct_master: Print auto-skipped and failed test cases: This uses the builtin event handler, which wasn't doing much before, so perhaps that's not wanted.

I also noted that the ct_master_status module appears to be completely unused, happy to add a commit to delete it.

The equivalent RabbitMQ PR is at https://github.com/rabbitmq/rabbitmq-server/pull/12502 and one of the comments there links to a few test runs that use ct_master with all changes included in our PR here.

Note that I have not run (or added to!) OTP's CT tests at this point, hoping to get some feedback on the more controversial points, and whether master should be the target or if maint would be OK.

cc @lhoguin

Oct 16 '24 09:10 essen

CT Test Results

2 files 58 suites 1h 24m 23s :stopwatch: 451 tests 438 :white_check_mark: 12 :zzz: 1 :x: 487 runs 471 :white_check_mark: 15 :zzz: 1 :x:

For more details on these failures, see this check.

Results for commit fd460277.

:recycle: This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

Oct 16 '24 09:10 github-actions[bot]

The test failure is because of the breaking change mentioned earlier.

Oct 16 '24 13:10 essen

Improve ct_master

CT Test Results

Artifacts