Cache - excluding files or folders with ! not working
Describe the bug
Excluding a file or folder does not work as expected with caching. For example,
~/cache
!~/cache/subfolder
is matching:
../../../cache
I think this is because caching uses implicitDescendants: false. Since we're not traversing any descendants, the negation pattern does not match anything.
One option is to just set implicitDescendants: true, but this may add inefficiencies as we will need to enumerate all the files and generate a large "includes file" for tar. We essentially need it to scan the descendants but stop whenever the entire folder is included. For example, the following files:
~/cache/foo.txt
~/cache/bar.txt
~/cache/src/main.c
~/cache/src/lib/baz.dll
~/cache/bin/main.exe
with the glob patterns of ~/cache and !~/cache/bin should produce the list:
~/cache/foo.txt
~/cache/bar.txt
~/cache/src
To Reproduce
See https://github.com/dhadka/cache-test-exclusion
Expected behavior
Negated files / folders are excluded from the cache.
Any updates here?
There is a workaround. Excluding works if you have all patterns in the same "level", for example
path: |
~/.m2/repository/*/*/*
!~/.m2/repository/org/apache/pulsar
this is what we use in apache/pulsar, full example here:
https://github.com/apache/pulsar/blob/master/.github/workflows/ci-integration-process.yaml#L55-L65
There is a workaround. Excluding works if you have all patterns in the same "level", for example
path: | ~/.m2/repository/*/*/* !~/.m2/repository/org/apache/pulsarthis is what we use in apache/pulsar, full example here:
https://github.com/apache/pulsar/blob/master/.github/workflows/ci-integration-process.yaml#L55-L65
Was trying to cache apt, using
path: |
/var/cache/apt/archives/*
!/var/cache/apt/archives/lock
!/var/cache/apt/archives/partial
But it still gives me a permission error for trying to access the partial folder in the Post Cache Dependencies Step:
EACCES: permission denied, scandir '/var/cache/apt/archives/partial'
Thanks for the workaround in https://github.com/actions/toolkit/issues/713#issuecomment-850321461. It works like a charm.
Here are my findings:
# 1. This will cache everything recursively inside `/a` excluding `/a/b`
path: |
/a/*
!/a/b
# 2. This will cache everything recursively inside `/a` including `/a/b`
path: |
/a
!/a/b
# 3. This will cache everything recursively inside `/a` including `/a/b`
# with an unexpectedly huge total cache size (~6x, your mileage might vary)
# and I don't see any visible differences when compared to 1. and 2.
path: |
/a/**
!/a/b
# Conclusion
# Just use 1. for the best of both worlds until the glob issue is fixed.
# Worth mentioning that modifying the value of `path` will affect the caches with the exact same `key`,
# so each cache is unique by its own `key` and `path`.
Hi guys. Was hitting this as well. Some digging in the code about the likely reason:
- https://github.com/actions/toolkit/blob/15e23998268e31520e3d93cbd106bd3228dea77f/packages/cache/src/internal/cacheUtils.ts#L42 is
actions/cachegenerating the list of paths to pass totarfor inclusion. - notice that in the options https://github.com/actions/toolkit/blob/15e23998268e31520e3d93cbd106bd3228dea77f/packages/glob/src/internal-glob-options.ts#L28
machDirectoriesdefaults totrue. - this is the globGenerator that generates the matches
- if a directory was matched, it will be output for. So
tarwill include it (regardless that later some subdirectories are would be excluded... the parent dir is already passed totar).
(Note: can observe this by enabling step debug logging, see https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging)
Other observation: the negative patterns must come after the positive one, otherwise a last positive match will always win (see https://github.com/actions/toolkit/blob/15e23998268e31520e3d93cbd106bd3228dea77f/packages/glob/src/internal-pattern-helper.ts#L65).
Fix suggestion: pass matchDirectories as false in https://github.com/actions/toolkit/blob/15e23998268e31520e3d93cbd106bd3228dea77f/packages/cache/src/internal/cacheUtils.ts#L46 - worst downside is that empty directories won't be tarred up, which sounds fine for caching use-case.
In the mean time, the workaround of expanding the paths with /* to the level of the excludes is doable as others greatly suggested. But is inconvenient and has some hard to cover edge cases.
Just wasted way too much time trying to figure this out. GitHub should really pay more attention to the inconsistencies of glob patterns between different parts of Actions.
Note that none of the fixes here work if the levels are deeper and there are multiple paths that need to be matched, for example:
# Works
path: |
/a/*
!/a/b
# Doesn't work
path: |
/a/*
!/a/b/c
# Works
path: |
/a/*/*
!/a/b/c
path: |
/a/*/*
!/a/b/c
!/a/b/c/d # No longer works