makisu icon indicating copy to clipboard operation
makisu copied to clipboard

failure to push cache

Open geekflyer opened this issue 6 years ago • 20 comments

Makisu fails to push the cache layers with this error

{"level":"info","ts":1544131874.1233864,"msg":"Stored cacheID mapping to KVStore: 63ee1ad1 => MAKISU_CACHE_EMPTY"}
{"level":"error","ts":1544131874.2301095,"msg":"Failed to push cache: push layer sha256:11c7c4087b02b2f738627b3a4edd566b0189c8cea77a8b2dc3aac1fb1b240194: check layer exists: gcr.io/makisu/cache (sha256:11c7c4087b02b2f738627b3a4edd566b0189c8cea77a8b2dc3aac1fb1b240194): check manifest exists: HEAD https://gcr.io/v2/makisu/cache/blobs/sha256:11c7c4087b02b2f738627b3a4edd566b0189c8cea77a8b2dc3aac1fb1b240194 401; push layer sha256:1dbcab28ce46b65c0174e5e82658492107396fead31e9144c343e6bc96e471c7: check layer exists: gcr.io/makisu/cache (sha256:1dbcab28ce46b65c0174e5e82658492107396fead31e9144c343e6bc96e471c7): check manifest exists: HEAD https://gcr.io/v2/makisu/cache/blobs/sha256:1dbcab28ce46b65c0174e5e82658492107396fead31e9144c343e6bc96e471c7 401"}

The command I executed was

docker run -i --rm --network host \
        -v /var/run/docker.sock:/docker.sock \
        -e DOCKER_HOST=unix:///docker.sock \
        -v $(pwd):/makisu-context \
        -v /tmp/makisu-storage:/makisu-storage \
        gcr.io/makisu-project/makisu:v0.1.3 build --modifyfs=true --load -registry-config=/makisu-context/registry.yaml -t="gcr.io/my-project/christian-playground/makisu-test" --push=gcr.io /makisu-context

My registry config looks like this:

"gcr.io":
  "my-project/*":
    push_chunk: -1
    security:
      basic:
        username: oauth2accesstoken
        password: somethingsecret

This error also occurs when attempting to use a redis based remote cache key value store. Btw makisu successfully pushes the image layers, but not the cache layers. Also it creates a bunch of cache entries with the value MAKISU_CACHE_EMPTY (both locally and in redis).

From the error logs it appears that makisu attempts to push the cache images to the wrong repository (gcr.io/makisu/cache instead of gcr.io/my-project/makisu/cache).

geekflyer avatar Dec 06 '18 21:12 geekflyer

@geekflyer do you mind trying again with v0.1.4?

yiranwang52 avatar Dec 06 '18 23:12 yiranwang52

distributed caching seems to work now! thanks. It still creates some cache entries with value MAKISU_CACHE_EMPTY though - is that expected?

geekflyer avatar Dec 07 '18 18:12 geekflyer

That's expected, for lines that's not RUN or COPY and shouldn't generate layers.

yiranwang52 avatar Dec 07 '18 19:12 yiranwang52

Sorry to dig this out again, but I am also seeing this issue, even with v0.1.6-3-g6fe4bf9.

I am attempting to build an image and push layers to a self-hosted docker registry (2.6.2). This is what I get:

/ # makisu build --push="registry.foo.bar" --modifyfs=true -t="$REMOTE_TAG" --registry-config=/registry-config.yaml /test
{"level":"info","ts":1545085460.4784067,"msg":"Starting Makisu build (version=v0.1.4-8-g812ef5d)"}
{"level":"info","ts":1545085460.478659,"msg":"Using build context: /test"}
{"level":"info","ts":1545085460.4789588,"msg":"Using file at /makisu-storage/cache_key_value.json for cacheID storage"}
{"level":"info","ts":1545085460.4792151,"msg":"* Stage 1/1 : (alias=0,latestfetched=-1)"}
{"level":"info","ts":1545085460.479267,"msg":"* Step 1/1 (commit) : FROM alpine  (ac0036c5)"}
{"level":"info","ts":1545085460.479531,"msg":"* Started pulling image index.docker.io/library/alpine:latest"}
{"level":"info","ts":1545085462.4828181,"msg":"* Started pulling layer index.docker.io/library/alpine:sha256:4fe2ade4980c2dda4fc95858ebb981489baec8c1e4bd282ab1c3560be8ff9bde"}
{"level":"info","ts":1545085462.9276593,"msg":"* Started pulling layer index.docker.io/library/alpine:sha256:196d12cf6ab19273823e700516e98eb1910b03b17840f9d5509f03858484d321"}
{"level":"info","ts":1545085463.3377192,"msg":"* Finished pulling layer library/alpine:196d12cf6ab19273823e700516e98eb1910b03b17840f9d5509f03858484d321"}
{"level":"info","ts":1545085463.6449273,"msg":"* Finished pulling layer library/alpine:4fe2ade4980c2dda4fc95858ebb981489baec8c1e4bd282ab1c3560be8ff9bde"}
{"level":"info","ts":1545085463.6458313,"msg":"* Finished pulling image index.docker.io/library/alpine:latest in 3.166240545s"}
{"level":"info","ts":1545085463.6487741,"msg":"* Processing FROM layer 4fe2ade4980c2dda4fc95858ebb981489baec8c1e4bd282ab1c3560be8ff9bde"}
{"level":"info","ts":1545085463.705377,"msg":"* Merged 469 headers from tar to memfs"}
{"level":"info","ts":1545085463.7054472,"msg":"* Execute FROM alpine  (ac0036c5) took 3.226027813s"}
{"level":"info","ts":1545085463.7055995,"msg":"* Committed gzipped layer sha256:4fe2ade4980c2dda4fc95858ebb981489baec8c1e4bd282ab1c3560be8ff9bde (2206931 bytes)"}
{"level":"info","ts":1545085463.7056513,"msg":"* Pushing with cache ID ac0036c5"}
{"level":"info","ts":1545085463.7058036,"msg":"* Moving directories [] to /makisu-storage/sandbox/sandbox821884972/stages/MA=="}
{"level":"error","ts":1545085463.9665835,"msg":"Failed to push cache: push layer sha256:4fe2ade4980c2dda4fc95858ebb981489baec8c1e4bd282ab1c3560be8ff9bde: check layer exists: registry.foo.bar/pascal/makisutest (sha256:4fe2ade4980c2dda4fc95858ebb981489baec8c1e4bd282ab1c3560be8ff9bde): check manifest exists: HEAD https://registry.foo.bar/v2/pascal/makisutest/blobs/sha256:4fe2ade4980c2dda4fc95858ebb981489baec8c1e4bd282ab1c3560be8ff9bde 401"}
{"level":"info","ts":1545085463.9678257,"msg":"Computed total image size 2206931","total_image_size":2206931}
{"level":"info","ts":1545085463.9678564,"msg":"Successfully built image pascal/makisutest:abc123"}
{"level":"info","ts":1545085463.9680684,"msg":"* Started pushing image registry.foo.bar/pascal/makisutest:abc123"}
2018/12/17 22:24:24 Command failure: failed to push image: failed to push image: check manifest exists for image registry.foo.bar/pascal/makisutest:abc123: check manifest exists: HEAD https://registry.foo.bar/v2/pascal/makisutest/manifests/abc123 401

(sorry, log is from a run with v0.1.4 but I reconfirmed with 0.1.6)

My registry-config.yaml:

registry.foo.bar:
  .*:
    security:
      tls:
        client:
          disabled: false
      basic:
        username: foo
        password: bar

pehlert avatar Dec 17 '18 22:12 pehlert

@pehlert do you mind sharing how you configured your registry? so we repro the issue.

yiranwang52 avatar Dec 17 '18 23:12 yiranwang52

@yiranwang52 Not at all!

Here is what I see in the log files:

10.244.1.0 - - [17/Dec/2018:23:41:57 +0000] "HEAD /v2/pascal/makisutest/blobs/sha256:4fe2ade4980c2dda4fc95858ebb981489baec8c1e4bd282ab1c3560be8ff9bde HTTP/1.1" 401 161 "" "Go-http-client/1.1"
time="2018-12-17T23:41:57Z" level=warning msg="error authorizing context: basic authentication challenge for realm \"basic-realm\": invalid authorization credential" go.version=go1.7.6 http.request.host=registry.foo.bar http.request.id=d29c82d4-38d3-4b78-8273-ea570fa26f48 http.request.method=HEAD http.request.remoteaddr=77.47.103.235 http.request.uri="/v2/pascal/makisutest/manifests/latest" http.request.useragent="Go-http-client/1.1" instance.id=e660abef-2f0e-46cf-becc-d8100bee4b62 vars.name="pascal/makisutest" vars.reference=latest version=v2.6.2
10.244.1.0 - - [17/Dec/2018:23:41:57 +0000] "HEAD /v2/pascal/makisutest/manifests/latest HTTP/1.1" 401 161 "" "Go-http-client/1.1"

This is the configuration:

    version: 0.1
    log:
      fields:
        service: registry
    storage:
      cache:
        blobdescriptor: inmemory
      filesystem:
        rootdirectory: /var/lib/registry
    http:
      addr: :5000
      headers:
        X-Content-Type-Options: [nosniff]
    auth:
      htpasswd:
        realm: basic-realm
        path: /auth/htpasswd
    health:
      storagedriver:
        enabled: true
        interval: 10s
        threshold: 3

This is all experimental, so I can change config for testing at any time.

pehlert avatar Dec 17 '18 23:12 pehlert

@pehlert hmm could you try docker login to your registry? I couldnt login to mine (using the same config). Wonder if you need to setup certs as well...

evelynl94 avatar Dec 18 '18 06:12 evelynl94

Hi @evelynl94!

docker login works perfectly fine - that's in fact the first thing I confirmed. Even a push using the docker CLI works.

I may need to mention that this is behind an nginx ingress which provides the SSL certs.

pehlert avatar Dec 18 '18 07:12 pehlert

Any idea what else I could try to debug this? makisu would be a real game changer for us as I hope it will reduce build times significantly. But with this issue, we can't use it at all unfortunately.

pehlert avatar Dec 22 '18 12:12 pehlert

@pehlert sorry things were going a little slow over here for the past few days due to holiday season. We will try to resolve this next week.

yiranwang52 avatar Dec 22 '18 13:12 yiranwang52

Hey there. Had this same issue. Looks like the problem goes away when I switch the basePingQuery to use https.

https://github.com/uber/makisu/blob/2312b399473f4d588e6c797e9b43b6492a7a5b83/lib/registry/security/basicauth.go#L30

I suspect that some registries used to respond using http, but now are either returning 301 or 302.

gcr.io returns:

* Rebuilt URL to: gcr.io/
*   Trying 172.217.197.82...
* TCP_NODELAY set
* Connected to gcr.io (172.217.197.82) port 80 (#0)
> GET / HTTP/1.1
> Host: gcr.io
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Location: https://gcr.io/
< Date: Thu, 10 Jan 2019 04:44:09 GMT
< Content-Type: text/html; charset=UTF-8
< Server: Docker Registry
< Content-Length: 212
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< 
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://gcr.io/">here</A>.
</BODY></HTML>
* Connection #0 to host gcr.io left intact

My internal registry returns:

$ curl -v http://docker.mycompany.com/v2/
*   Trying 10.10.10.10...
* TCP_NODELAY set
* Connected to docker.mycompany.com (10.10.10.10) port 80 (#0)
> GET /v2/ HTTP/1.1
> Host: docker.mycompany.com
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Content-length: 0
< Location: https://docker.mycompany.com/v2/
< Connection: close
< 
* Closing connection 0

A working ping response would look like:

$ curl -v https://gcr.io
* Rebuilt URL to: https://gcr.io/
*   Trying 172.217.197.82...
* TCP_NODELAY set
* Connected to gcr.io (172.217.197.82) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=*.googlecode.com
*  start date: Dec  4 09:33:00 2018 GMT
*  expire date: Feb 26 09:33:00 2019 GMT
*  subjectAltName: host "gcr.io" matched cert's "gcr.io"
*  issuer: C=US; O=Google Trust Services; CN=Google Internet Authority G3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fc46980c600)
> GET / HTTP/2
> Host: gcr.io
> User-Agent: curl/7.54.0
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 302 
< location: https://cloud.google.com/container-registry/
< content-length: 241
< date: Thu, 10 Jan 2019 04:46:52 GMT
< content-type: text/html; charset=UTF-8
< server: Docker Registry
< x-xss-protection: 1; mode=block
< x-frame-options: SAMEORIGIN
< alt-svc: quic=":443"; ma=2592000; v="44,43,39,35"
< 
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://cloud.google.com/container-registry/">here</A>.
</BODY></HTML>
* Connection #0 to host gcr.io left intact

visualphoenix avatar Jan 10 '19 04:01 visualphoenix

@visualphoenix we change http to https in Request.Scheme if you configure that in --registry-config. That being said, I dont think this const string matters...

The code also follows 301 (this is done by default golang http library) because some registries serve http and redirects to https (DockerHub and gcr for example).

Please share your registry configuration.

evelynl94 avatar Jan 10 '19 05:01 evelynl94

Not sure if this helps much, but I tried to disable https for my registry temporarily and am still seeing the same errors.

pehlert avatar Jan 10 '19 06:01 pehlert

@evelynl94 following is my registry config:

"docker.mycompany.com":
  ".*":
    security:
      tls:
        ca:
          cert:
            path: /etc/pki/tls/cert.pem
      basic:
        username: user
        password: apikey

@evelynl94 i confirm i already have your patch from 2312b399473f4d588e6c797e9b43b6492a7a5b83 but it doesn't help.

build command was:

./makisu build --registry-config=/home/centos/registry.yaml -t="docker.mycompany.com/sre/baseami:test" --push=docker.mycompany.com /home/centos/image/

After patching https://github.com/uber/makisu/blob/2312b399473f4d588e6c797e9b43b6492a7a5b83/lib/registry/security/basicauth.go#L30 to the following, the image pushes:

basePingQuery         = "https://%s/v2/"

visualphoenix avatar Jan 10 '19 16:01 visualphoenix

@visualphoenix can you share any log from the registry? The auth client should send a https request first then fallback to http if there is a network error.

evelynl94 avatar Jan 10 '19 18:01 evelynl94

@visualphoenix Just to clarify, are you saying that if you change:

basePingQuery         = "http://%s/v2/"

to

basePingQuery         = "https://%s/v2/"

then everything works?

yiranwang52 avatar Jan 10 '19 18:01 yiranwang52

@yiranwang52 correct

visualphoenix avatar Jan 10 '19 19:01 visualphoenix

I suspect there is something else missing from the code which is not handling the 302 FOUND Location: https://docker.mycompany.com/v2/ redirection

< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Content-length: 0
< Location: https://docker.mycompany.com/v2/
< Connection: close

visualphoenix avatar Jan 10 '19 19:01 visualphoenix

Not entirely sure if this is the same issue, but it seems the same because of the 401s I've attached my full build log to help, hopefully. I ran makisu built from master to take advantage of the --preserve-root flag as I'm running in jenkins-kubernetes.

makisu-build-fail.log

Angry-Potato avatar May 14 '19 15:05 Angry-Potato

For the record, 401 errors where introduced by https://github.com/uber/makisu/commit/25ee10e2f2b801cb542976a28c6f9660a07b2c60 and fixed by https://github.com/uber/makisu/commit/92443c6b46d0fed7473eb2f71a739d265ecf9016 (its confirmed by @Angry-Potato that the later fixed his 401 issue).

Rowern avatar May 15 '19 11:05 Rowern