toolkit icon indicating copy to clipboard operation
toolkit copied to clipboard

Cache - excluding files or folders with ! not working

Open dhadka opened this issue 4 years ago • 7 comments

Describe the bug

Excluding a file or folder does not work as expected with caching. For example,

~/cache
!~/cache/subfolder

is matching:

../../../cache

I think this is because caching uses implicitDescendants: false. Since we're not traversing any descendants, the negation pattern does not match anything.

One option is to just set implicitDescendants: true, but this may add inefficiencies as we will need to enumerate all the files and generate a large "includes file" for tar. We essentially need it to scan the descendants but stop whenever the entire folder is included. For example, the following files:

~/cache/foo.txt
~/cache/bar.txt
~/cache/src/main.c
~/cache/src/lib/baz.dll
~/cache/bin/main.exe

with the glob patterns of ~/cache and !~/cache/bin should produce the list:

~/cache/foo.txt
~/cache/bar.txt
~/cache/src

To Reproduce

See https://github.com/dhadka/cache-test-exclusion

Expected behavior

Negated files / folders are excluded from the cache.

dhadka avatar Feb 08 '21 15:02 dhadka

Any updates here?

mustaphazorgati avatar May 28 '21 09:05 mustaphazorgati

There is a workaround. Excluding works if you have all patterns in the same "level", for example

          path: |
            ~/.m2/repository/*/*/*
            !~/.m2/repository/org/apache/pulsar

this is what we use in apache/pulsar, full example here:

https://github.com/apache/pulsar/blob/master/.github/workflows/ci-integration-process.yaml#L55-L65

lhotari avatar May 28 '21 10:05 lhotari

There is a workaround. Excluding works if you have all patterns in the same "level", for example

          path: |
            ~/.m2/repository/*/*/*
            !~/.m2/repository/org/apache/pulsar

this is what we use in apache/pulsar, full example here:

https://github.com/apache/pulsar/blob/master/.github/workflows/ci-integration-process.yaml#L55-L65

Was trying to cache apt, using

path: |
    /var/cache/apt/archives/*
    !/var/cache/apt/archives/lock
    !/var/cache/apt/archives/partial

But it still gives me a permission error for trying to access the partial folder in the Post Cache Dependencies Step: EACCES: permission denied, scandir '/var/cache/apt/archives/partial'

swarajpeppermint avatar Jul 01 '21 07:07 swarajpeppermint

Thanks for the workaround in https://github.com/actions/toolkit/issues/713#issuecomment-850321461. It works like a charm.

Here are my findings:

# 1. This will cache everything recursively inside `/a` excluding `/a/b`
path: |
    /a/*
    !/a/b

# 2. This will cache everything recursively inside `/a` including `/a/b`
path: |
    /a
    !/a/b

# 3. This will cache everything recursively inside `/a` including `/a/b`
#    with an unexpectedly huge total cache size (~6x, your mileage might vary)
#    and I don't see any visible differences when compared to 1. and 2.
path: |
    /a/**
    !/a/b

# Conclusion
# Just use 1. for the best of both worlds until the glob issue is fixed.
# Worth mentioning that modifying the value of `path` will affect the caches with the exact same `key`,
# so each cache is unique by its own `key` and `path`.

motss avatar Feb 05 '22 15:02 motss

Hi guys. Was hitting this as well. Some digging in the code about the likely reason:

  • https://github.com/actions/toolkit/blob/15e23998268e31520e3d93cbd106bd3228dea77f/packages/cache/src/internal/cacheUtils.ts#L42 is actions/cache generating the list of paths to pass to tar for inclusion.
  • notice that in the options https://github.com/actions/toolkit/blob/15e23998268e31520e3d93cbd106bd3228dea77f/packages/glob/src/internal-glob-options.ts#L28 machDirectories defaults to true.
  • this is the globGenerator that generates the matches
  • if a directory was matched, it will be output for. So tar will include it (regardless that later some subdirectories are would be excluded... the parent dir is already passed to tar).

(Note: can observe this by enabling step debug logging, see https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging)

Other observation: the negative patterns must come after the positive one, otherwise a last positive match will always win (see https://github.com/actions/toolkit/blob/15e23998268e31520e3d93cbd106bd3228dea77f/packages/glob/src/internal-pattern-helper.ts#L65).

Fix suggestion: pass matchDirectories as false in https://github.com/actions/toolkit/blob/15e23998268e31520e3d93cbd106bd3228dea77f/packages/cache/src/internal/cacheUtils.ts#L46 - worst downside is that empty directories won't be tarred up, which sounds fine for caching use-case.

In the mean time, the workaround of expanding the paths with /* to the level of the excludes is doable as others greatly suggested. But is inconvenient and has some hard to cover edge cases.

robinp avatar Feb 05 '22 20:02 robinp

Just wasted way too much time trying to figure this out. GitHub should really pay more attention to the inconsistencies of glob patterns between different parts of Actions.

kamzil avatar Oct 03 '24 10:10 kamzil

Note that none of the fixes here work if the levels are deeper and there are multiple paths that need to be matched, for example:

# Works
path: |
    /a/*
    !/a/b

# Doesn't work
path: |
    /a/*
    !/a/b/c
    
# Works
path: |
    /a/*/*
    !/a/b/c
    
path: |
    /a/*/*
    !/a/b/c
    !/a/b/c/d # No longer works

jonkoops avatar Oct 16 '24 14:10 jonkoops