github-actions-golang
github-actions-golang copied to clipboard
Caching test results across action runs
I've found this repo very helpful, thank you! I've noticed that using the actions-cache as demonstrated in the repo doesn't result in tests being cached. I know on my local machine when I rerun tests for packages that haven't changed the results are cached, but when I run on GitHub Actions even with all the cache directories setup the tests aren't cached.
It'd help speed up tests for a lot of Go projects if we could figure that out.
I'm curious, is there a particular reason you want the test cache to be included? I generally want CI to always re-run the tests, so I can catch flaky tests more easily.
I'm fairly sure that the test results are also cached in the build cache; see go help cache. So the second cache example, which includes ~/.cache/go-build, should just work on Linux. There are a number of reasons why this might not work:
-
GitHub's env sets up the cache elsewhere, e.g. by setting
GOCACHEorXDG_CACHE_HOME. I don't think this is the case, as otherwise I imagine many others would have seen slow builds or empty/missingactions/cachesaves. -
Your env sets up the cache elsewhere. Always a possibility - but then you'd have to adapt the caching strategy. This repo just has examples.
-
The way Actions sets up the environment isn't deterministic, so that effectively invalidates all previous cached test results. I think this is the most likely answer; any change in environment variables or files read by Go's test process means that the cached test results can't be reused. If Actions is changing any env vars like
TMPDIRorGOPATH, which they might well be, then there's your answer. The reason I think this is likely is because they set up temporary runner environments for each job, so it's likely there's some randomness in some directory paths or environment variables used by Go.
I think for big repositories you can take advantage of the test cache and additionally set up a cron job to run all tests (e.g. on a nightly basis). I think it cool to have it documented (how it works. how to enable it, how to disable it etc).
Sure, I don't oppose improving the docs here. Someone has to dig into the actual reason why the test cache isn't being reused, though - I laid out three possibilities above. I don't have the time to dig into them right now.
I'm digging into this. On a small repo I've managed to get test caching working reliably using the instructions in this repo. The same instructions applied to another repo with no testdata, no file loading, and only one consistent environment variable results in no test caching.
I think it might be related to test caching not working when running a single test package vs ./.... 🤔
Maybe test caching is broken in Go when any env var is read. In my test example when an os.Getenv exists in the code, test caching never runs, even if the value I'm reading is consistent. If I remove the os.Getenv test caching works fine. 🤔
I think I know what the issue is. It's the cache key we're using. GitHub Actions will not resave the cache if it got a cache hit on cache load, which means if go.sum hasn't changed, then we won't update the cache even if other things have changed!
GitHub Actions will not resave the cache if it got a cache hit on cache load, which means if go.sum hasn't changed, then we won't update the cache even if other things have changed!
Hm, that's unfortunate. I guess it makes sense from a "pure deterministic cache" perspective on their end, but on our end it's very hard to provide a proper, full cache key that combines all inputs for Go's module deps, package builds, and tests.
CC @FiloSottile since he raised similar points in https://github.com/mvdan/github-actions-golang/issues/12.
I think using hashes of go.sum files is fine if we are only caching the module download cache. But if we're also caching package builds and tests, it's certainly not enough. Perhaps ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}-${{ hashFiles('**/*.go') would be a slightly better approximation - it would still not cover edge cases like testdata files or env vars, but it would be significantly more accurate than just go.sum files.
I'd go further and include all files in the hash, because test files could impact the build too. This is the pattern I'm using now:
- uses: actions/cache@v2
with:
path: ~/go/pkg/mod
key: ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-mod-${{ hashFiles('**/go.sum') }}
restore-keys: ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-mod
- uses: actions/cache@v2
if: ${{ github.ref != 'master' }}
with:
path: ~/.cache/go-build
key: ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-all-${{ github.ref }}-${{ hashFiles('**', '!.git') }}
restore-keys: |
${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-all-${{ github.ref }}
${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-all
See it in context here: https://github.com/stellar/go/pull/3727
Note that I've chosen to include github.ref in the key because I want cache items from the current branch to be preferred over cache items from other branches.
If there was a way to serialize the matrix object into the cache key I'd do that so that all matrix params automatically had different caches, but I don't see a way of doing that with Actions.
Oh, wow, that's pretty long :P
At that point, practically any commit would result in a different cache key, right? I wonder how feasible it would be to just use something like...
key: ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-${{ RANDOM_STRING }}
restore-keys: |
${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-${{ RANDOM_STRING }}
${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-
I assume the full github commit hash is a good approximation for "random".
At that point, practically any commit would result in a different cache key, right?
The cache key needs to be unique if the tests could result in different results, so those different results get cached. The restore keys value will ensure a recent cache is selected.
I assume the full github commit hash is a good
The commit head would work too. I'm taking a hash of all files in the working directory. A commit hash would also be sufficient. Could also just use the github actions build number.
The environment (like the versions of services like Postgres that might be in the matrix) is still important, so we still need those values in the cache key.