cache icon indicating copy to clipboard operation
cache copied to clipboard

actions/cache/save does not fail the step on error

Open Kaltenbach opened this issue 9 months ago • 4 comments

actions/cache/save recently (2025-03-26) failed with a timeout and a subsequent error as it looks like the key has been created in the initial attempt:

Attempt 1 of 5 failed with error: Request timeout: /twirp/github.actions.results.api.v1.CacheService/CreateCacheEntry. Retrying request in 3000 ms...
Failed to save: Unable to reserve cache with key Linux-build-workspace-930, another job may be creating this cache.
Warning: Cache save failed.

The action is configured as follows:

    - name: Cache workspace
      uses: actions/cache/save@v4
      env:
        cache-name: persist-workspace
      with:
        path: |
          ...
        key: ${{ runner.os }}-build-workspace-${{ github.run_number }}

Current behavior: the step's conclusion is success:

        {
          "name": "Persist build workspace",
          "status": "completed",
          "conclusion": "success",
          "number": 19,
          "started_at": "2025-03-26T02:17:23Z",
          "completed_at": "2025-03-26T02:17:40Z"
        }

Expected behavior: the action (and job) fails if the cache cannot be created. If this is not generally wanted, having a parameter to fail the step/job on cache creation would be highly welcomed.

Please note: this is a sporadic error which we cannot reproduce. We have not noticed such a behavior before the Deprecation Notice - Upgrade to latest before February 1st 2025.

Kaltenbach avatar Mar 27 '25 16:03 Kaltenbach

We have a similar problem in https://github.com/kyamagu/skia-python/pull/313 - cache creation failed, in a span of 3 days spanning 1st April. It was successful only a few days ago, with no code change (justing updating readmes and internal version number for release).

HinTak avatar Apr 04 '25 18:04 HinTak

We still run from time to time into this issue. Would be great if someone would take care of this. 🙏

Kaltenbach avatar Jul 09 '25 08:07 Kaltenbach

We just had a big outage in production due to this.

Image

And to make matters worse, it soft-locked the cache key, preventing any further writes to it. This is another screenshot taken 12h after that first one:

Image

It basically made a soft-lock on the cache key, making it impossible to write to it.

cc'ing @nebuk89 @vsvipul @joshmgross @kotewar

Luc45 avatar Jul 30 '25 14:07 Luc45

It basically made a soft-lock on the cache key, making it impossible to write to it.

We also noticed that the key gets locked. We have a cleanup action for some caches which - in this scenario - then fails to cleanup the cache as no cache is found. Still the key remains locked. I would expect that, if storing the cache fails, also the key gets cleaned up.

Kaltenbach avatar Jul 31 '25 06:07 Kaltenbach