feat: set up very expensive tests to run in CI
Related Issues
Closes https://github.com/filecoin-project/lotus/issues/12136
Proposed Changes
In this PR I set up a new workflow responsible for running very expensive tests. I also enable running expensive tests in our regular test workflow.
Additional Info
This is an updated version of https://github.com/filecoin-project/lotus/pull/12234
In this iteration, the very expensive tests workflow is separate to the test workflow. This is so that we don't have to rerun all the tests on a label change.
The way the new workflow works is that when one adds a need/very-expensive-tests label to a PR, the workflow will start, remove the label, and execute the tests.
The new workflow will only execute the tests that have very expensive components to them. The information about that is currently stored in the ci cmd tool. It will be moved to the test files themselves in the future once we move more towards the self-identification setup.
This is not ready to be merged because the only very expensive test we have at the moment is failing with signal: killed. Here's a link to the most recent failure: https://github.com/filecoin-project/lotus/actions/runs/13706584703
Question: should the workflow comment on a PR with the very expensive test results? I think it might be useful since if one modifies labels after the very expensive tests are run, then the link to the very expensive tests run would disappear from the PR.
Checklist
Before you mark the PR ready for review, please make sure that:
- [ ] Commits have a clear commit message.
- [ ] PR title conforms with contribution conventions
- [ ] Update CHANGELOG.md or signal that this change does not need it per contribution conventions
- [ ] New features have usage guidelines and / or documentation updates in
- [ ] Tests exist for new functionality or change in behavior
- [ ] CI is green
Why do we remove need/very-expensive-tests? I assume because this is the way to manually trigger the expensive test run for the PR.
I guess in a perfect world if the need/very-expensive-tests label was present, the expensive test groups would become a required CI check and would automatically get run whenever other tests are run.
I was trying to avoid the need to rerun the very expensive tests whenever unrelated labels are changed because these tests are expensive to run. That's also the reason why I didn't want to bundle them with the other tests. The problem is that we cannot configure the workflow to trigger only when a specific label changes. It's all or nothing.
However, I did get another idea for how we might approach it. Maybe we could drop the removal of the label and, instead, if an unrelated label is modified, we would look up the last run of the very expensive tests and just copy the result from it. WDYT?
I think PR's labeled with need/very-expensive-tests should have the comment in the PR. (Ideally it would behave the same as every other test, except this PR just has extra tests being run as a result of the label.).
If we implement that new idea, we wouldn't have to worry about the comments.
For a PR, I don't think we should be cutting issues if very expensive tests fail. I think we should only cut an issue if the very expensive cron job fails.
👍
Looking at the bad test: LOTUS_RUN_VERY_EXPENSIVE_TESTS=1 go test ./itests/niporep_manual_test.go -run TestManualNISectorOnboarding/real_proofs,_1_miner_with_7_sectors,_1_bad -v
At the peak, only about 700 epochs in, which is where the test is getting killed (quite early, this isn't timeout related) we get to ~32GiB of RAM, so I assume this is the limitation we're dealing with for these runners?
3836338 rvagg 20 0 4226.9g 31.6g 2.7g S 3064 16.8 7:55.08 /tmp/go-build2459112198/b001/itests.test -test.testlogfile=/tmp/go-build2459112198/b001/test+
It goes down from there, this is the initial part where it's generating the initial proof so it's a lot of work. There's only this one case that's in un here and it's the real_proofs that make it "very expensive". What options do we have on the runner side here because I'm not sure how much lattitude we have on the software side because I'm suspecting this is almost all in the proofs code; but I can explore that further if we have no options. Could we add some small swap just for this case if we're strictly limited to 32GiB? We shouldn't need much at all, I think we're probably just pushing a little over 32GiB for the system in this run.
btw once we get into it, it goes way down:
3836338 rvagg 20 0 4185.4g 3.7g 1.3g S 53.3 2.0 228:35.20 /tmp/go-build2459112198/b001/itests.test -test.testlogfile=/tmp/go-build2459112198/b001/test+
but it does take a long time to run this test because of the amount of time needed to tick through all the epochs required. In my test run it's got up to 15,478 epochs 556.76s; on a fairly fast system.
@galargh : apologies for the delay here. A few things:
For getting the current VERY_EXPENSIVE_TEST to pass, it seems like we need more memroy (slightly more than 32GB). Can this be accomplished with beefier runners or can we enable some swap space on the runners we have (more info)?
In terms of the interaction model:
- Periodic running where failures cut an issue - it seems like we're covered here ✅
- Selectively running at times a manual step in the Lotus release process or during a PR that we think warrants an explicit change - It sounds like the model we're looking at is triggering a a test run by applying a label. I am understanding the interaction better where applying the PR "run expensive tests" label queues up the test run and when the test run completes, the results are linked as a comment in the PR and the PR "run expensive tests" label is removed. The PR label is removed so that there is a way to explicitly re-request "run expensive tests" on the PR again (e.g., after pushing additional commits). I think that works and if that's a lot easier from an implementation regard, lets go with it. I don't want to block this PR further on best being the enemy of better.
- I was envisioning a world where whenever we'd run our tests normally there would be a step to check if the PR "run expensive tests" label was present on the PR and if so also run expensive tests and if not skip the expensive tests. But per above, if that isn't easily feasible, lets skip that.
I moved the tests to a slightly bigger runner and it looks like we're through :) I also updated the very expensive tests workflow so that it behaves similarly to our regular one. The one difference is, it will also get triggered whenever a label is added to a PR.
This is the diff of test.yml from master vs resuable-test.yml - https://github.com/filecoin-project/lotus/commit/6cc0c97d46a21a4f7ced55bff679071dd6218681
ACK, thanks, this is good to go I think