mmtk-core Rework on CI stress tests

Currently our stress tests run for each commit in master, and due to the timeout of 3 hours, none of them can finish.

We need rework on the stress tests. Ideally:

The stress tests should include the combination of [benchmark x binding x plan].
We estimate each successful run would take more than 5h.
Each run should be as simple as running one benchmark for a binding for a plan, e.g. fop-openjdk-semispace.
When a run finishes, the next run should be started. The next run should pick up the latest commit, and run on that.
When all the runs finish, we start another round.

Depends on whether it is possible to implement this with Github actions, we may change our design a bit. The principles are: 1. keep running the tests, 2. each test pick up the latest commit (not every commit), 3. if any test fails, we manually do a bisect to find which commit intorduces a bug.

May 21 '21 03:05 qinsoon

Some benchmarks may take a long time to run with a stress factor 1 (GC every 1 byte). We can set a timeout for each run, such as 1 day. If the run times out, we can rerun it with a stress factor of 1 page.

May 24 '21 03:05 qinsoon

This is my proposal to implement stress CI. Can you please take a look and let me know if this is reasonable? @wks @caizixian @steveblackburn

Store run state and results (Github repo)

We will need a place to store the result for each stress run so we can monitor the status of stress tests. We will also use the last run result to determine what to run next for stress ([benchmark x binding x plan]). We can use a private repo (in the same way as our ci-perf-result), and we render the results in a custom way with Github pages.
Drive a run (Github bot)

We have set up a Github bot for MMTk (private repo). We can use the bot to monitor and schedule new stress jobs. The bot will periodically check if there is any stress job in progress. If none, the bot will 1. dispatch a new stress test job (read the last run state, determine the next stress run ([benchmark x binding x plan]), and dispatch the stress job), and 2. dispatch another workflow to render the results.
Stress test (Github workflow)

Stress test job is a manual Github workflow, triggered by the bot. Each job will run for the given [benchmark x binding x plan] for a fine-grained stress factor with a long timeout (e.g. 8 bytes as stress factor for 1 day). If the run times out, the job will run the given [benchmark x binding x plan] for a coarse grained stress factor with a shorter timeout (e.g. 4K bytes for 5 hours). If Run#1 passes or Run#2 passes, we mark the run as success. Otherwise, mark the run as fail. We store the run state. The choice of stress factor and timeout needs to be careful. We need to make sure that Run#2 will run to an end (success or fail). We need to make sure that Run#1 has a reasonable timeout.
Render result (Github workflow)

The workflow is also triggered by the bot. It reads results from the result repo, renders and deploys the results as a github page. We will render the results for a given binding on a benchmark together so we can easily see the results for history runs. Each run should have one of these states: Run#1 success, Run#2 success, Run#1 fail, Run#2 fail, Run#2 timeout.
Stress CI machine (Github action runner)

The machine that is used for stress test CI is configured as a normal github action runner with a label (that is different from the perf CI machines).

Dec 01 '21 04:12 qinsoon