gotestsum Add tool ci-matrix

This PR adds a new subcommand gotestsum tool ci-matrix for use with github actions. The subcommand:

reads test2json files saved in a github actions cache from previous test runs
uses the test2json data to calculate the 85th percentile runtime of each package
uses the package runtime to bucket all the packages into n buckets

A github actions workflow can then use those buckets in a matrix, so that packages are split between the buckets based on their runtime. This should result in optimal splitting of packages to minimize overall CI runtime.

Using the subcommand requires integration with github actions, so I'll need to document exact how to wire it all together. Once it's working well, maybe it would make sense to publish an action to handle parts of it.

Jun 20 '22 01:06 dnephin

Hey @dnephin, I'd like to look at using this to improve CI on pulumi/pulumi. Any way I can help review this, validate it?

Sep 03 '22 21:09 AaronFriel

They @AaronFriel , that would be great! The main project I'm working on these days has a test suites that is just fast enough that it doesn't benefit from this much (at least not yet). It would be great to try it out on a larger project with more tests.

I'll work on getting this running on a schedule on my project to see how test runtime improves as the cache collects more data from previous runs. Once that's done I'll share a link to the updated config in this PR.

If you have any questions about how this works, or how to use it, I'd be happy to answer them.

I'm ready to merge this as soon as I've seen it run well in a real CI workflow. If you have any suggestions for the code, or the feature, please do share!

Sep 03 '22 22:09 dnephin

This diff seems to be working well.

It adds 3 things:

a test-matrix job that uses this gotestsum tool ci-matrix command to create the JSON matrix using previous test timing. The matrix is a job output.
the test job is updated to use that matrix and to output timing reports as artifacts.
the test-collect-reports job runs after test to download the artifacts from each of the parallel jobs, and save them into a cache, so that the next run of test-matrix can use the test reports.

In this project the total runtime of the test job was just over 4 minutes, and it's just under 3 minutes with the matrix. In this case the largest package is responsible for 2+ minutes of the test run, so the benefit from run tests in parallel is limited by that one package.

Sep 04 '22 01:09 dnephin

@dnephin I definitely have a larger, and more expensive, matrix of tests we want to run this on (recent run). We've been relying on breaking out the tests by hand, but maintaining that by hand is a bit frustrating and doesn't make for easy opportunities to split tests into more groups.

How would this branch handle adding a new package in a PR, that wasn't in the previous test run?

I think in terms of using it, we had the same thought. Using the actions cache to restore before and after. We'll bake in a default profile into a JSON file in the repo from a prior run or a local run to ensure we have some data.

For pulumi/pulumi to use this, could you push a tag like ci-matrix-0.1 for us use for security and reliability? In my experience branches are mutable refs & when github prunes commits (because a branch was deleted, for example) go install ...@commitish can fail.

Sep 04 '22 04:09 AaronFriel

Edit: I think you can disregard the message below. I found two cases where we had an inner loop and were running a subtest-per-item with approximately 1 million items. Removing t.Run() looks to be a good perf improvement and reduces the size of these files by >99%.

Oh - another thing, is there a more compact representation of the json reports, a simplify/prune/distill the information as a preprocessing step? We produce roughly 3.5GiB of JSON output, which is a little over a third of the cache capacity of a GitHub repo. We'd obliterate all our other caches in 3 runs.

If we could run a jq pass - or even better, use the result of ci-matrix as the cached output between runs?

ls ../test-reports/ -lah
total 3.5G
drwxr-xr-x  2 friel friel 4.0K Sep  3 21:09 .
drwxr-xr-x 19 friel friel 4.0K Sep  3 21:09 ..
-rw-r--r--  1 friel friel  28K Sep  3 19:47 438ea576-9267-47fe-a023-b5fc4814deaf.json
-rw-r--r--  1 friel friel 1.2G Sep  3 19:56 5eabe5e9-db9d-48bd-bd13-7913c4eeb3e4.json
-rw-r--r--  1 friel friel  28K Sep  3 19:05 66073323-df78-4be8-b5df-224d5a148e14.json
-rw-r--r--  1 friel friel 3.0M Sep  3 19:25 874f76fe-2f2a-40bc-94b1-3ac53f7deb59.json
-rw-r--r--  1 friel friel 2.9M Sep  3 19:50 9685a1e1-e7da-42d2-97f3-62d59606bfcb.json
-rw-r--r--  1 friel friel  27K Sep  3 19:45 992d418b-ae8e-4dbe-ae08-e213c2628011.json
-rw-r--r--  1 friel friel  18K Sep  3 19:49 9b3ee554-2c7d-4931-82a0-744573de956d.json
-rw-r--r--  1 friel friel  18K Sep  3 19:08 a4b9a0ab-a257-4572-8a87-f7a9938f40a6.json
-rw-r--r--  1 friel friel  18K Sep  3 19:24 c78f5ecd-a86a-447d-ad23-481a1e43ae13.json
-rw-r--r--  1 friel friel 1.2G Sep  3 19:35 d7387e6b-c9ad-450b-94cd-0aa30e875b55.json
-rw-r--r--  1 friel friel  28K Sep  3 19:22 e7e22942-f20c-46ee-ab3a-4c1f9aaeb8ed.json
-rw-r--r--  1 friel friel 2.9M Sep  3 19:09 ed80e727-74c7-40e5-80e3-51da252b4f1a.json
-rw-r--r--  1 friel friel 1.2G Sep  3 19:20 f98ed980-ff60-47c7-97bd-c5b0063d13f2.json

Sep 04 '22 04:09 AaronFriel

How would this branch handle adding a new package in a PR, that wasn't in the previous test run?

Excellent question! To ensure that all tests are run, the full list of packages to test is always pulled from stdin. It should always be run this way:

go list ./... | gotestsum tool ci-matrix --debug --dir=

If a package has no timing data associated with it, it will be assigned to the bucket with the smallest about of estimated runtime.

We'll bake in a default profile into a JSON file in the repo from a prior run or a local run to ensure we have some data.

Probably a good idea. In the case where there is no timing data, gotestsum tool ci-matrix will round-robin the packages into buckets. So you still end up with some improvements over running them all sequentially, but it won't be optimal.

For pulumi/pulumi to use this, could you push a tag like ci-matrix-0.1 for us use for security and reliability?

Yes, no problem. I'll tag a pre-release with these changes by early next week. I'd like to make a few small improvements before tagging it.

Sep 04 '22 16:09 dnephin

is there a more compact representation of the json reports, a simplify/prune/distill the information as a preprocessing step?

Oh ya, I was expected this to be a problem for larger projects. Awesome that you found a way to reduce the size of the logs. I've worked on projects that output hundreds of lines of logs per test, which produced similarly large output.

I haven't solved that problem yet, but I have a couple of ideas for how it might be solved.

One option would be to add a new command line flag to the main gotestsum command to tell it to only output events that are not Action=Output. Something like --jsonfile-omit-output (boolean flag that would have to be used in addition to --jsonfile), or --jsonfile-timing=<filename> (a separate flag that writes only the important timing events, and none of the other events).
The second option would be a new filter subcommand gotestsum tool filter , that would do basically the same thing. Accept a full log of events, and filter it based on the flags passed to that command.

--jsonfile-timing=<filename> seems like it might be the easiest to use.

Sep 04 '22 16:09 dnephin

I had a quick look at your CI workflow. If you are looking to split a single package by test runtime, that is not supported yet. This command splits by package which is a bit easier.

I think it would be possible to add support for splitting a single package by test timing, but I think it'd be better to do that as a second PR after this one merges.

Sep 04 '22 19:09 dnephin

This is the updated example for the latest changes in this branch. The gotestsum tool ci-matrix command now outputs a full matrix json, which makes it easier to use in the dependent test job.

Sep 04 '22 20:09 dnephin

I've started that work in #277, just needs a bit more testing I think

Sep 05 '22 19:09 dnephin

I'm already having tremendous success on the previous revision, example here: https://github.com/pulumi/pulumi/actions/runs/2996070671

(Ignore the failures, I'm bisecting our packages to determine which can run as unit tests and which require an installed CLI.)

Looking forward to this stabilizing.

--jsonfile-timing=<filename> seems like it might be the easiest to use.

Agreed. This would also help in case we inadvertently log secrets in a test. GitHub Actions will redact, but these test files would be raw output, I believe?

Sep 05 '22 21:09 AaronFriel

GitHub Actions will redact, but these test files would be raw output, I believe?

I believe that's right

Nov 26 '22 18:11 dnephin

Ok, I've got this running again over at https://github.com/dnephin/infra/actions/workflows/ci-core.yaml

I remove file pruning from gotestsum tool ci-matrix, it's easily done with

find /home/runner/reports/ -mindepth 1 -mtime +3 -delete

Or some other command. We can also restore file pruning later if we need it.

There's still lots to do for this new command:

the new flag in #277
support different output formats so that the tool is easier to integrate into existing tooling
document the new subcommand in the README
add --jsonfile-timing

I think this PR is ready to merge, and I'll create a new issue for tracking next steps. (#287)

Nov 26 '22 18:11 dnephin

gotestsum gotestsum copied to clipboard

Add tool ci-matrix

gotestsum
gotestsum copied to clipboard