cache icon indicating copy to clipboard operation
cache copied to clipboard

Cache restore sometimes downloads ~0MB caches

Open matthieu-foucault opened this issue 2 years ago • 4 comments

We use the cache for tools installed with asdf, and for perl libraries. We have a flaky bug where sometimes, the same cache that works fine on a branch doesn't work on another branch, and a ~0MB archive is downloaded. This has been happening randomly over the last 6 months (maybe once every 30/40 pull request).

Example:

On both branches A and B, our install-dev-tools job sees that we haven't updated anything, and gets a cache hit:

Post job cleanup.
Post job cleanup.
Cache hit occurred on the primary key Linux-yarn-cache-07a3008bf228aaf955d98ce084eb8e6b09b623263d80578af5b7d65840fbbbe3-v2, not saving cache.
Post job cleanup.
Cache hit occurred on the primary key Linux-perl-cache-391933be47fcbd2760fbcccebd7446cf5f0f10d80e572e5adeeb9c4f9f62b6e8-v2, not saving cache.
Post job cleanup.
Cache hit occurred on the primary key Linux-asdf-cache-8093192a4cdff5685410dbe3b09b322ef56cc2496b4af9cf83ac7196aafffb7c, not saving cache.

On branch A, the subsequent job, running tests, restores the cache and everything works well:

Run actions/cache@v2
  with:
    path: ~/.asdf
  
    key: Linux-asdf-cache-8093192a4cdff5685410dbe3b09b322ef56cc2496b4af9cf83ac7196aafffb7c
  env:
    PGUSER: postgres
    ASDF_DIR: /home/runner/.asdf
    ASDF_DATA_DIR: /home/runner/.asdf
    PERL5LIB: /home/runner/perl5/lib/perl5
    PERL_LOCAL_LIB_ROOT: /home/runner/perl5
    PERL_MB_OPT: --install_base '/home/runner/perl5'
    PERL_MM_OPT: INSTALL_BASE=/home/runner/perl5
Received 176160768 of 261546746 (67.4%), 167.8 MBs/sec
Received 261546746 of 261546746 (100.0%), 158.1 MBs/sec
Cache Size: ~249 MB (261546746 B)
/usr/bin/tar --use-compress-program zstd -d -xf /home/runner/work/_temp/301a4ee1-882b-4c9d-8005-1ffe324dadbb/cache.tzst -P -C /home/runner/work/cas-cif/cas-cif
Cache restored successfully

Cache restored from key: Linux-asdf-cache-8093192a4cdff5685410dbe3b09b322ef56cc2496b4af9cf83ac7196aafffb7c

Now, on branch B, for some reason, the cache is different, and the job fails later, because it is missing the executable the cache was supposed to restore. Note the difference in cache size

Run actions/cache@v2
  with:
    path: ~/.asdf
  
    key: Linux-asdf-cache-8093192a4cdff5685410dbe3b09b322ef56cc2496b4af9cf83ac7196aafffb7c
  env:
    PGUSER: postgres
    ASDF_DIR: /home/runner/.asdf
    ASDF_DATA_DIR: /home/runner/.asdf
    PERL5LIB: /home/runner/perl5/lib/perl5
    PERL_LOCAL_LIB_ROOT: /home/runner/perl5
    PERL_MB_OPT: --install_base '/home/runner/perl5'
    PERL_MM_OPT: INSTALL_BASE=/home/runner/perl5
Received 413443 of 413443 (100.0%), 5.8 MBs/sec
Cache Size: ~0 MB (413443 B)
/usr/bin/tar --use-compress-program zstd -d -xf /home/runner/work/_temp/77537a38-6e05-4b3b-9f00-eecc3d3b5904/cache.tzst -P -C /home/runner/work/cas-cif/cas-cif
Cache restored successfully

Cache restored from key: Linux-asdf-cache-8093192a4cdff5685410dbe3b09b322ef56cc2496b4af9cf83ac7196aafffb7c

matthieu-foucault avatar May 27 '22 00:05 matthieu-foucault

@matthieu-foucault can you please confirm if both branch A and B are always feature(child) branches? Or is either of them ever a main(master/parent) branch?

kotewar avatar Jun 01 '22 11:06 kotewar

@kotewar Yes, as far as I can remember the issue only happened on feature branches

matthieu-foucault avatar Jun 01 '22 15:06 matthieu-foucault

Similar issue here except it consistently downloads ~0MB for the same key that was previously generated. In my case, it's the main branch creating the key to be used again by the main branch or feature branches off the main branch. The random aspect here is that when the cache key is updated / busted, it will either pull the ~0MB or the correct data, but it will consistently pull the same data for the same key.

Example:

Post job cleanup.
/usr/bin/tar --posix --use-compress-program zstd -T0 -cf cache.tzst -P -C /home/runner/work/***/*** --files-from manifest.txt
Cache Size: ~3 MB (622065 B)
Cache saved successfully
Cache saved with key: lambda-aaf11c3b392621f139bf55dd436f5271a83ea375d54c7e355ffc8bca38dd2491

And then during subsequent workflow runs, it pulls ~0 MB anytime it grabs the key lambda-aaf11c3b392621f139bf55dd436f5271a83ea375d54c7e355ffc8bca38dd2491

Run actions/cache@v2
  with:
    key: lambda-aaf11c3b392621f139bf55dd436f5271a83ea375d54c7e355ffc8bca38dd2491
    path: functions
Received 22 of 22 (100.0%), 0.0 MBs/sec
Cache Size: ~0 MB (22 B)
/usr/bin/tar --use-compress-program zstd -d -xf /home/runner/work/_temp/0891c4dd-c279-456f-8cad-bedb2878e547/cache.tzst -P -C /home/runner/work/***/***
Cache restored successfully
Cache restored from key: lambda-aaf11c3b392621f139bf55dd436f5271a83ea375d54c7e355ffc8bca38dd2491

I'll also note that in this example the same key is pulled in a later job in the first workflow run (the one that initially creates the key) and it correctly grabs ~3 MB. Subsequent workflow runs however that try to grab the key get ~0 MB.

anngelyque avatar Jun 02 '22 21:06 anngelyque

@anngelyque can you please share your workflow file or a sample workflow to simulate the issue. I am not able to replicate the same with the info you've provided.

kotewar avatar Jun 09 '22 12:06 kotewar

Hi @matthieu-foucault, are you still facing this issue?

kotewar avatar Aug 17 '22 10:08 kotewar

Hi @kotewar , just encountered it now in https://github.com/bcgov/CONN-CCBC-portal/runs/8073512335?check_suite_focus=true:

Run actions/cache@v2
  with:
    path: ~/.asdf
  
    key: Linux-asdf-cache-fcc9bf91c47d06add757287d0fb86747258f3ba45ca5cc3de4a6a744ee1a6410
  env:
    PGPASSWORD: postgres
    PGUSER: postgres
    PGDATABASE: ccbc
    ASDF_DIR: /home/runner/.asdf
    ASDF_DATA_DIR: /home/runner/.asdf
    PERL5LIB: /home/runner/perl5/lib/perl5
    PERL_LOCAL_LIB_ROOT: /home/runner/perl5
    PERL_MB_OPT: --install_base '/home/runner/perl5'
    PERL_MM_OPT: INSTALL_BASE=/home/runner/perl5
Received 394868 of 394868 (100.0%), 1.4 MBs/sec
Cache Size: ~0 MB (394868 B)
/usr/bin/tar --use-compress-program zstd -d -xf /home/runner/work/_temp/d6403261-63cd-4882-9d71-178db5f5faa7/cache.tzst -P -C /home/runner/work/CONN-CCBC-portal/CONN-CCBC-portal
Cache restored successfully

matthieu-foucault avatar Aug 29 '22 15:08 matthieu-foucault

Nevermind, the issue my latest message above is expected behaviour due to misconfiguration of our workflow. Closing this issue, I'll open a new one if I see this happening again with a properly configured workflow

matthieu-foucault avatar Aug 29 '22 17:08 matthieu-foucault