github-actions-golang Caching test results across action runs

I've found this repo very helpful, thank you! I've noticed that using the actions-cache as demonstrated in the repo doesn't result in tests being cached. I know on my local machine when I rerun tests for packages that haven't changed the results are cached, but when I run on GitHub Actions even with all the cache directories setup the tests aren't cached.

It'd help speed up tests for a lot of Go projects if we could figure that out.

Jun 29 '21 03:06 leighmcculloch

I'm curious, is there a particular reason you want the test cache to be included? I generally want CI to always re-run the tests, so I can catch flaky tests more easily.

I'm fairly sure that the test results are also cached in the build cache; see go help cache. So the second cache example, which includes ~/.cache/go-build, should just work on Linux. There are a number of reasons why this might not work:

GitHub's env sets up the cache elsewhere, e.g. by setting GOCACHE or XDG_CACHE_HOME. I don't think this is the case, as otherwise I imagine many others would have seen slow builds or empty/missing actions/cache saves.
Your env sets up the cache elsewhere. Always a possibility - but then you'd have to adapt the caching strategy. This repo just has examples.
The way Actions sets up the environment isn't deterministic, so that effectively invalidates all previous cached test results. I think this is the most likely answer; any change in environment variables or files read by Go's test process means that the cached test results can't be reused. If Actions is changing any env vars like TMPDIR or GOPATH, which they might well be, then there's your answer. The reason I think this is likely is because they set up temporary runner environments for each job, so it's likely there's some randomness in some directory paths or environment variables used by Go.

Jun 30 '21 09:06 mvdan

I think for big repositories you can take advantage of the test cache and additionally set up a cron job to run all tests (e.g. on a nightly basis). I think it cool to have it documented (how it works. how to enable it, how to disable it etc).

Jun 30 '21 10:06 pellared

Sure, I don't oppose improving the docs here. Someone has to dig into the actual reason why the test cache isn't being reused, though - I laid out three possibilities above. I don't have the time to dig into them right now.

Jun 30 '21 10:06 mvdan

I'm digging into this. On a small repo I've managed to get test caching working reliably using the instructions in this repo. The same instructions applied to another repo with no testdata, no file loading, and only one consistent environment variable results in no test caching.

I think it might be related to test caching not working when running a single test package vs ./.... 🤔

Jun 30 '21 16:06 leighmcculloch

Maybe test caching is broken in Go when any env var is read. In my test example when an os.Getenv exists in the code, test caching never runs, even if the value I'm reading is consistent. If I remove the os.Getenv test caching works fine. 🤔

Jun 30 '21 16:06 leighmcculloch

I think I know what the issue is. It's the cache key we're using. GitHub Actions will not resave the cache if it got a cache hit on cache load, which means if go.sum hasn't changed, then we won't update the cache even if other things have changed!

Jun 30 '21 16:06 leighmcculloch

GitHub Actions will not resave the cache if it got a cache hit on cache load, which means if go.sum hasn't changed, then we won't update the cache even if other things have changed!

Hm, that's unfortunate. I guess it makes sense from a "pure deterministic cache" perspective on their end, but on our end it's very hard to provide a proper, full cache key that combines all inputs for Go's module deps, package builds, and tests.

CC @FiloSottile since he raised similar points in https://github.com/mvdan/github-actions-golang/issues/12.

I think using hashes of go.sum files is fine if we are only caching the module download cache. But if we're also caching package builds and tests, it's certainly not enough. Perhaps ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}-${{ hashFiles('**/*.go') would be a slightly better approximation - it would still not cover edge cases like testdata files or env vars, but it would be significantly more accurate than just go.sum files.

Jul 14 '21 11:07 mvdan

I'd go further and include all files in the hash, because test files could impact the build too. This is the pattern I'm using now:

    - uses: actions/cache@v2
       with:
         path: ~/go/pkg/mod
         key: ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-mod-${{ hashFiles('**/go.sum') }}
         restore-keys: ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-mod
     - uses: actions/cache@v2
       if: ${{ github.ref != 'master' }}
       with:
         path: ~/.cache/go-build
         key: ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-all-${{ github.ref }}-${{ hashFiles('**', '!.git') }}
         restore-keys: |
           ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-all-${{ github.ref }}
           ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-go-all

See it in context here: https://github.com/stellar/go/pull/3727

Note that I've chosen to include github.ref in the key because I want cache items from the current branch to be preferred over cache items from other branches.

If there was a way to serialize the matrix object into the cache key I'd do that so that all matrix params automatically had different caches, but I don't see a way of doing that with Actions.

Jul 14 '21 16:07 leighmcculloch

Oh, wow, that's pretty long :P

At that point, practically any commit would result in a different cache key, right? I wonder how feasible it would be to just use something like...

key: ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-${{ RANDOM_STRING }}
restore-keys: |
  ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-${{ RANDOM_STRING }}
  ${{ github.workflow }}-${{ github.job }}-${{ matrix.go }}-${{ runner.os }}-

I assume the full github commit hash is a good approximation for "random".

Jul 14 '21 17:07 mvdan

At that point, practically any commit would result in a different cache key, right?

The cache key needs to be unique if the tests could result in different results, so those different results get cached. The restore keys value will ensure a recent cache is selected.

I assume the full github commit hash is a good

The commit head would work too. I'm taking a hash of all files in the working directory. A commit hash would also be sufficient. Could also just use the github actions build number.

The environment (like the versions of services like Postgres that might be in the matrix) is still important, so we still need those values in the cache key.

Jul 14 '21 18:07 leighmcculloch

github-actions-golang github-actions-golang copied to clipboard

Caching test results across action runs

github-actions-golang
github-actions-golang copied to clipboard