buildkit
buildkit copied to clipboard
snapshot extract- does not exist: not found
Reported by @ciaranmcnulty in Slack
#113 [pax-lookup-api php-with-codebase 7/7] RUN --mount=type=cache,target=/root/.composer --mount=source=composer.json,target=composer.json --mount=source=composer.lock,target=composer.lock --mount=from=composer:1,source=/usr/bin/composer,target=/usr/bin/composer composer install --prefer-dist --no-dev
#113 ERROR: snapshot extract-2tnrifjtpyxujo9lliuz0e5bz sha256:f4b82550d9b480f19ed522b76fc7afc0c630fb3f093bcab9a062a57c53c44bda does not exist: not found
@sipsma https://github.com/moby/buildkit/blob/master/cache/refs.go#L1232-L1234 If I'm correct that this is snapshot only for a couple of calls within this function, then does this point to containerd reference counting problem? Or am I missing something?
https://github.com/moby/buildkit/issues/2988 is potentially similar although not much data.
@ciaranmcnulty You can share full log just in case. Ideally, daemon debug logs could show if containerd gc ran at the same time when this error appeared.
https://github.com/moby/buildkit/blob/master/cache/refs.go#L1232-L1234 If I'm correct that this is snapshot only for a couple of calls within this function, then does this point to containerd reference counting problem? Or am I missing something?
@tonistiigi Yeah there's also a check that a lease exists that creates one if not present: https://github.com/moby/buildkit/blob/7b2c27c98ce623104caf65846d99774555999fa4/cache/refs.go#L1176-L1183
The only possibility I can think of would be that there is a lease provided by some higher caller that somehow gets released (in a different goroutine) while unlazy is still running. I didn't find any situation where that could happen, but admittedly I didn't search completely exhaustively, so worth a second look.
@sipsma Do you know where does the lease get created that is active here? Or does it depend on the call path? Ideally these temp leases should not depend on the caller?
If it is from caller then I think we should avoid passing lease context with goroutines. They should be synchronous only then. If it is cache record lease then maybe the right locks are not taken that should block any modifications to that lease.
We could probably get the callback from full example from @ciaranmcnulty .
Sure I just need to work out what needs to be redacted from the log as I'm under a fairly strict NDA