bollard icon indicating copy to clipboard operation
bollard copied to clipboard

`remove_image` incorrectly maps `Temporary failure in name resolution` to `DockerResponseNotFoundError`

Open Veetaha opened this issue 3 years ago • 4 comments

This has appeared in my stress test of create/remove_image methods (https://github.com/fussybeaver/bollard/issues/190#issuecomment-1038940591).

[2022-02-14T06:50:11Z DEBUG bollard::read] Decoding JSON line from stream: {"errorDetail":{"message":"error pulling image configuration: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/39/39d290d0ed044e20481d8a02dfb84509e48426e99a7175d1412299db837818eb/data?verify=1644824401-2%2BNoBGk55KVtN7yebBVVT%2B%2Fmd1A%3D\": dial tcp: lookup production.cloudflare.docker.com: Temporary failure in name resolution"},"error":"error pulling image configuration: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/39/39d290d0ed044e20481d8a02dfb84509e48426e99a7175d1412299db837818eb/data?verify=1644824401-2%2BNoBGk55KVtN7yebBVVT%2B%2Fmd1A%3D\": dial tcp: lookup production.cloudflare.docker.com: Temporary failure in name resolution"}
[2022-02-14T06:50:11Z DEBUG bollard::uri] Parsing uri: unix://2f7661722f72756e2f646f636b65722e736f636b/images/amazon/dynamodb-local:1.18.0, client_type: Unix, socket: /var/run/docker.sock
[2022-02-14T06:50:11Z DEBUG bollard::docker] unix://2f7661722f72756e2f646f636b65722e736f636b/images/amazon/dynamodb-local:1.18.0
[2022-02-14T06:50:11Z DEBUG bollard::docker] request: Request { method: DELETE, uri: unix://2f7661722f72756e2f646f636b65722e736f636b/images/amazon/dynamodb-local:1.18.0, version: HTTP/1.1, headers: {"x-registry-auth": "eyJ1c2VybmFtZSI6bnVsbCwicGFzc3dvcmQiOm51bGwsImF1dGgiOm51bGwsImVtYWlsIjpudWxsLCJzZXJ2ZXJhZGRyZXNzIjpudWxsLCJpZGVudGl0eXRva2VuIjpudWxsLCJyZWdpc3RyeXRva2VuIjpudWxsfQ==", "content-type": "application/json"}, body: Body(Empty) }
thread 'bollard::luck_testing' panicked at 'called `Result::unwrap()` on an `Err` value: DockerResponseNotFoundError { message: "{\"message\":\"No such image: amazon/dynamodb-local:1.18.0\"}\n" }', crates/foo/src/bollard.rs:32:54

The reported error should be mapped to some error variant that signifies "network error" instead for the users to reliably retry it.

Veetaha avatar Feb 14 '22 10:02 Veetaha

I was briefly looking into this, and I think it's worth putting some work into changing how errors are handled in Bollard - but aside from the status code that's returned from the server, we don't really know what JSON structure an error contains...

Nevertheless, let's see if we can handle this one, maybe we can try to deserialise some errors, and if that fails go back to dumping out a string like we do currently.

However, I couldn't figure out from your logs whether that was the result of one API request ? Could you post the code that generated this log ?

fussybeaver avatar Feb 22 '22 14:02 fussybeaver

The original code that generated this error is available in this comment: https://github.com/fussybeaver/bollard/issues/190#issuecomment-1038940591. This log comes from remove_image.

But that is a stress test. A lighter weight way to reproduce this is probably by running remove_image with the internet connection disabled.

Veetaha avatar Feb 23 '22 03:02 Veetaha

Hmm..

It looks like there are two Docker instructions happening (the read looks like a BuildInfo type, which is returned from a build_image request) - are you sure that it's the same code as in https://github.com/fussybeaver/bollard/issues/190#issuecomment-1038940591 ?

I suspect the first instruction deserialised correctly and emitted a network error as part of the payload, but continued executing because the result stream was a 200 Docker response, hence letting the second instruction execute, which fails for non-network related reasons. The remove_image a API request doesn't occur with the docker registry - it just connects to the local daemon to remove the image. In this case, a 404 makes sense, because the image does not exist on the local daemon.

At this point, I wonder if one should catch error messages coming from a successful result stream and fail at some point - or consider it as a little too over-engineered.

fussybeaver avatar Feb 25 '22 17:02 fussybeaver

Yeah, I am definitely sure that came from https://github.com/fussybeaver/bollard/issues/190#issuecomment-1038940591. I was surprised how remove_image would access the network, maybe I misinterpreted the logs there. It could be that create_image failed on network error, but I didn't see Pulling failed message printed, which made me think it had been the log from remove_image.

Veetaha avatar Feb 26 '22 15:02 Veetaha

Related to https://github.com/fussybeaver/bollard/issues/242

fussybeaver avatar Sep 02 '22 12:09 fussybeaver