gotestsum
gotestsum copied to clipboard
Add tool ci-matrix
This PR adds a new subcommand gotestsum tool ci-matrix
for use with github actions. The subcommand:
- reads
test2json
files saved in a github actions cache from previous test runs - uses the
test2json
data to calculate the 85th percentile runtime of each package - uses the package runtime to bucket all the packages into
n
buckets
A github actions workflow can then use those buckets in a matrix, so that packages are split between the buckets based on their runtime. This should result in optimal splitting of packages to minimize overall CI runtime.
Using the subcommand requires integration with github actions, so I'll need to document exact how to wire it all together. Once it's working well, maybe it would make sense to publish an action to handle parts of it.
Hey @dnephin, I'd like to look at using this to improve CI on pulumi/pulumi. Any way I can help review this, validate it?
They @AaronFriel , that would be great! The main project I'm working on these days has a test suites that is just fast enough that it doesn't benefit from this much (at least not yet). It would be great to try it out on a larger project with more tests.
I'll work on getting this running on a schedule on my project to see how test runtime improves as the cache collects more data from previous runs. Once that's done I'll share a link to the updated config in this PR.
If you have any questions about how this works, or how to use it, I'd be happy to answer them.
I'm ready to merge this as soon as I've seen it run well in a real CI workflow. If you have any suggestions for the code, or the feature, please do share!
This diff seems to be working well.
It adds 3 things:
- a
test-matrix
job that uses thisgotestsum tool ci-matrix
command to create the JSON matrix using previous test timing. The matrix is a job output. - the
test
job is updated to use that matrix and to output timing reports as artifacts. - the
test-collect-reports
job runs aftertest
to download the artifacts from each of the parallel jobs, and save them into a cache, so that the next run oftest-matrix
can use the test reports.
In this project the total runtime of the test job was just over 4 minutes, and it's just under 3 minutes with the matrix. In this case the largest package is responsible for 2+ minutes of the test run, so the benefit from run tests in parallel is limited by that one package.
@dnephin I definitely have a larger, and more expensive, matrix of tests we want to run this on (recent run). We've been relying on breaking out the tests by hand, but maintaining that by hand is a bit frustrating and doesn't make for easy opportunities to split tests into more groups.
How would this branch handle adding a new package in a PR, that wasn't in the previous test run?
I think in terms of using it, we had the same thought. Using the actions cache to restore before and after. We'll bake in a default profile into a JSON file in the repo from a prior run or a local run to ensure we have some data.
For pulumi/pulumi to use this, could you push a tag like ci-matrix-0.1
for us use for security and reliability? In my experience branches are mutable refs & when github prunes commits (because a branch was deleted, for example) go install ...@commitish
can fail.
Edit: I think you can disregard the message below. I found two cases where we had an inner loop and were running a subtest-per-item with approximately 1 million items. Removing t.Run()
looks to be a good perf improvement and reduces the size of these files by >99%.
Oh - another thing, is there a more compact representation of the json reports, a simplify/prune/distill the information as a preprocessing step? We produce roughly 3.5GiB of JSON output, which is a little over a third of the cache capacity of a GitHub repo. We'd obliterate all our other caches in 3 runs.
If we could run a jq
pass - or even better, use the result of ci-matrix
as the cached output between runs?
ls ../test-reports/ -lah
total 3.5G
drwxr-xr-x 2 friel friel 4.0K Sep 3 21:09 .
drwxr-xr-x 19 friel friel 4.0K Sep 3 21:09 ..
-rw-r--r-- 1 friel friel 28K Sep 3 19:47 438ea576-9267-47fe-a023-b5fc4814deaf.json
-rw-r--r-- 1 friel friel 1.2G Sep 3 19:56 5eabe5e9-db9d-48bd-bd13-7913c4eeb3e4.json
-rw-r--r-- 1 friel friel 28K Sep 3 19:05 66073323-df78-4be8-b5df-224d5a148e14.json
-rw-r--r-- 1 friel friel 3.0M Sep 3 19:25 874f76fe-2f2a-40bc-94b1-3ac53f7deb59.json
-rw-r--r-- 1 friel friel 2.9M Sep 3 19:50 9685a1e1-e7da-42d2-97f3-62d59606bfcb.json
-rw-r--r-- 1 friel friel 27K Sep 3 19:45 992d418b-ae8e-4dbe-ae08-e213c2628011.json
-rw-r--r-- 1 friel friel 18K Sep 3 19:49 9b3ee554-2c7d-4931-82a0-744573de956d.json
-rw-r--r-- 1 friel friel 18K Sep 3 19:08 a4b9a0ab-a257-4572-8a87-f7a9938f40a6.json
-rw-r--r-- 1 friel friel 18K Sep 3 19:24 c78f5ecd-a86a-447d-ad23-481a1e43ae13.json
-rw-r--r-- 1 friel friel 1.2G Sep 3 19:35 d7387e6b-c9ad-450b-94cd-0aa30e875b55.json
-rw-r--r-- 1 friel friel 28K Sep 3 19:22 e7e22942-f20c-46ee-ab3a-4c1f9aaeb8ed.json
-rw-r--r-- 1 friel friel 2.9M Sep 3 19:09 ed80e727-74c7-40e5-80e3-51da252b4f1a.json
-rw-r--r-- 1 friel friel 1.2G Sep 3 19:20 f98ed980-ff60-47c7-97bd-c5b0063d13f2.json
How would this branch handle adding a new package in a PR, that wasn't in the previous test run?
Excellent question! To ensure that all tests are run, the full list of packages to test is always pulled from stdin. It should always be run this way:
go list ./... | gotestsum tool ci-matrix --debug --dir=
If a package has no timing data associated with it, it will be assigned to the bucket with the smallest about of estimated runtime.
We'll bake in a default profile into a JSON file in the repo from a prior run or a local run to ensure we have some data.
Probably a good idea. In the case where there is no timing data, gotestsum tool ci-matrix
will round-robin the packages into buckets. So you still end up with some improvements over running them all sequentially, but it won't be optimal.
For pulumi/pulumi to use this, could you push a tag like ci-matrix-0.1 for us use for security and reliability?
Yes, no problem. I'll tag a pre-release with these changes by early next week. I'd like to make a few small improvements before tagging it.
is there a more compact representation of the json reports, a simplify/prune/distill the information as a preprocessing step?
Oh ya, I was expected this to be a problem for larger projects. Awesome that you found a way to reduce the size of the logs. I've worked on projects that output hundreds of lines of logs per test, which produced similarly large output.
I haven't solved that problem yet, but I have a couple of ideas for how it might be solved.
- One option would be to add a new command line flag to the main
gotestsum
command to tell it to only output events that are notAction=Output
. Something like--jsonfile-omit-output
(boolean flag that would have to be used in addition to--jsonfile
), or--jsonfile-timing=<filename>
(a separate flag that writes only the important timing events, and none of the other events). - The second option would be a new filter subcommand
gotestsum tool filter
, that would do basically the same thing. Accept a full log of events, and filter it based on the flags passed to that command.
--jsonfile-timing=<filename>
seems like it might be the easiest to use.
I had a quick look at your CI workflow. If you are looking to split a single package by test runtime, that is not supported yet. This command splits by package which is a bit easier.
I think it would be possible to add support for splitting a single package by test timing, but I think it'd be better to do that as a second PR after this one merges.
This is the updated example for the latest changes in this branch. The gotestsum tool ci-matrix
command now outputs a full matrix json, which makes it easier to use in the dependent test job.
I've started that work in #277, just needs a bit more testing I think
I'm already having tremendous success on the previous revision, example here: https://github.com/pulumi/pulumi/actions/runs/2996070671
(Ignore the failures, I'm bisecting our packages to determine which can run as unit tests and which require an installed CLI.)
Looking forward to this stabilizing.
--jsonfile-timing=<filename>
seems like it might be the easiest to use.
Agreed. This would also help in case we inadvertently log secrets in a test. GitHub Actions will redact, but these test files would be raw output, I believe?
GitHub Actions will redact, but these test files would be raw output, I believe?
I believe that's right
Ok, I've got this running again over at https://github.com/dnephin/infra/actions/workflows/ci-core.yaml
I remove file pruning from gotestsum tool ci-matrix
, it's easily done with
find /home/runner/reports/ -mindepth 1 -mtime +3 -delete
Or some other command. We can also restore file pruning later if we need it.
There's still lots to do for this new command:
- the new flag in #277
- support different output formats so that the tool is easier to integrate into existing tooling
- document the new subcommand in the README
- add --jsonfile-timing
I think this PR is ready to merge, and I'll create a new issue for tracking next steps. (#287)