google-cloud-cpp icon indicating copy to clipboard operation
google-cloud-cpp copied to clipboard

[Flake] kaniko flake (5xx) fetching image from production.cloudflare.docker.com

Open devjgm opened this issue 4 years ago • 34 comments

Log: https://pantheon.corp.google.com/cloud-build/builds;region=global/a7edfd7e-c43e-43c2-a3d6-3ddba4565084;step=0?project=cloud-cpp-testing-resources

Digest: sha256:a12a027e1d0afbeb6cc31bb07e89d94dc47fa768265416350d442d878bdb6064
Status: Downloaded newer image for gcr.io/kaniko-project/executor:edge
gcr.io/kaniko-project/executor:edge
INFO[0000] Resolved base name centos:7 to devtools      
INFO[0000] Using dockerignore file: /workspace/ci/.dockerignore 
INFO[0000] Retrieving image manifest centos:7           
INFO[0000] Retrieving image manifest centos:7           
error building image: GET https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/86/8652b9f0cb4c0599575e5a003f5906876e10c1ceb2ab9fe1786712dac14a50cf/data?verify=REDACTED: unsupported status code 503; body: <!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Temporarily unavailable | production.cloudflare.docker.com | Cloudflare</title></title>
...

It looks like our kaniko build step that was creating our image got a 503 when fetching one of the layers from docker.com. I'm not sure if there's anything we can do to fix this. I think not.

But I'm filing this issue anyway so we can track if it's a common issue.

devjgm avatar May 03 '21 15:05 devjgm

Seems like this could be useful:

https://github.com/GoogleContainerTools/kaniko/blob/master/README.md#--registry-mirror

coryan avatar May 03 '21 15:05 coryan

I tried adding --registry-mirror=mirror.gcr.io and did not work. From what I could gather from the error messages, mirror.gcr.io does not host a number of images we need (Fedora:33, or Ubuntu:bionic).

There is a way to create our own mirror and host it, but that seems very involved.

coryan avatar May 04 '21 12:05 coryan

This one is similar enough that I think we should consolidate them:

https://pantheon.corp.google.com/cloud-build/builds;region=global/9ef522b9-59b8-4ca9-9a71-0d5789a497ee;step=0?project=cloud-cpp-testing-resources

error building image: GET https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/ae/aefd7f02ae24739b95f77c488de70465c54653f394097b9859acede976c80e03/data?verify=REDACTED: unsupported status code 502; body: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>cloudflare</center>
</body>
</html>

coryan avatar May 08 '21 01:05 coryan

I figured out why --registry-mirror=mirror.gcr.io does not work: it just holds "latest" of each popular image, bummer.

coryan avatar May 10 '21 14:05 coryan

A different solution may involve using the "warmer" program:

https://github.com/GoogleContainerTools/kaniko/tree/master/cmd/warmer

This can download the base image to /cache which could be a shared volume between the warmer and the kaniko steps.

There are a couple of additional twists:

  • Saying something like fedora:33 or ubuntu:bionic requires a roundtrip to registry.github.io because those versions may (and do) change.
  • These roundtrips to registry.github.io are what fails with 5xx errors
  • We can avoid them (to some degree) if we pin a SHA, and say FROM fedora:33@sha256:ab9c680acef5a053cf2a6bddcebfa9674576d5104927180ef27a35d2dbab15fc
  • Note that using the SHA saves a roundtrip to the registry even if we do not use the warmer, in other words, one less chance to get snake eyes when rolling the network dice.
  • Both the warmer and the kaniko steps would need to download the same SHA, the warmer takes a docker image name, not a dockerfile as input, that suggests a script to extract the version from the Dockerfile
  • We can have the renovate bot update the SHA, that in itself seems interesting, because maybe we do not want updates to the OS unless we run the tests first.
  • Note that the warmer step would still need to download the base image, so we are saving roundtrips to the registry, but still have a download.
  • Maybe we can have that /cache directory really cached as a GCS tarball (sure would be nice if kaniko did that instead).

coryan avatar May 10 '21 14:05 coryan

No repeats in 90d, closing. I suspect we will need to reopen though.

coryan avatar Sep 23 '21 13:09 coryan

https://console.cloud.google.com/cloud-build/builds;region=us-east1/99910c10-0808-429c-9f10-4262c19416ba?project=cloud-cpp-testing-resources

Step #0: error building image: GET https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/dc/dcf4d4bef137f695d11ed187ba6a135362dca3de36955c4da0905d596ce521bc/data?verify=REDACTED: unexpected status code 502 Bad Gateway: <html>
Step #0: <head><title>502 Bad Gateway</title></head>
Step #0: <body>
Step #0: <center><h1>502 Bad Gateway</h1></center>
Step #0: <hr><center>cloudflare</center>
Step #0: </body>
Step #0: </html>

devbww avatar Feb 19 '22 22:02 devbww

https://pantheon.corp.google.com/cloud-build/builds;region=us-east1/472c831b-dd84-4d4d-98a5-7274a880d8b3;step=0?project=cloud-cpp-testing-resources

error building image: error building stage: failed to get filesystem from image: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/7c/7c3b88808835aa80f1ef7f03083c5ae781d0f44e644537cd72de4ce6c5e62e00/data?verify=1647437859-y6QANhynFYEw1DOP0fkes9J%2F4eY%3D": read tcp 192.168.10.3:54192->104.18.124.25:443: read: connection reset by peer

devjgm avatar Mar 16 '22 14:03 devjgm

Maybe this can help:

https://github.com/GoogleContainerTools/kaniko/blob/v1.8.0/README.md#--image-fs-extract-retry

coryan avatar Mar 16 '22 14:03 coryan

Nice find. Can't hurt. Let's give it a shot: https://github.com/googleapis/google-cloud-cpp/pull/8558

devjgm avatar Mar 16 '22 21:03 devjgm

This was a 404 https://console.cloud.google.com/cloud-build/builds;region=us-east1/a24f0fba-ce12-43bd-923f-d0a1964f1442?project=936212892354

dbolduc avatar May 07 '22 12:05 dbolduc

90d without a repeat, closing.

coryan avatar Aug 06 '22 16:08 coryan

CONSOLE_LOG_URL https://console.cloud.google.com/cloud-build/builds;region=us-east1/bdf61932-1f3a-48f0-9a11-aec2059c0b25;tab=detail?project=cloud-cpp-testing-resources RAW_LOG_URL https://storage.googleapis.com/cloud-cpp-community-publiclogs/logs/google-cloud-cpp/main/c33459e640286d3ff0c7a51c1f66bc6e0d2acb36/demo-debian-bullseye-demo-install/log-bdf61932-1f3a-48f0-9a11-aec2059c0b25.txt

dbolduc avatar Jun 12 '23 19:06 dbolduc

Slightly different error message, but I think the same root cause. I am changing the title to be more generic.

https://pantheon.corp.google.com/cloud-build/builds;region=us-east1/542ff150-ac6a-4420-8692-b084b5c5e189?project=cloud-cpp-testing-resources&mods=logs_tg_prod

Step #0: error building image: error building stage: failed to execute command: extracting fs from image: read tcp 192.168.10.2:52280->142.251.162.207:443: read: connection reset by peer

coryan avatar Sep 06 '23 15:09 coryan

https://storage.cloud.google.com/cloud-cpp-community-publiclogs/logs/google-cloud-cpp/12747/73b2c7b43aa5faa972bddffdf524ff72ed81f9fc/fedora-latest-cxx14-cxx14/log-1f83671f-909d-470d-9591-5e4a7d585647.txt

dbolduc avatar Sep 27 '23 23:09 dbolduc

https://pantheon.corp.google.com/cloud-build/builds;region=us-east1/94fd1e7a-7f1a-4952-bec3-29a5cd2466a2;step=0?e=-13802955&mods=logs_tg_prod&project=cloud-cpp-testing-resources

coryan avatar Oct 12 '23 12:10 coryan

https://console.cloud.google.com/cloud-build/builds;region=us-east1/9b967be5-c8a8-4871-9c83-407660b496dc?project=cloud-cpp-testing-resources

devbww avatar Oct 17 '23 02:10 devbww

https://console.cloud.google.com/cloud-build/builds;region=us-east1/226f040c-17ef-4b65-b68d-d410e818372a?project=936212892354

error building image: error building stage: failed to execute command: extracting fs from image: read tcp 192.168.10.2:57314->74.125.26.207:443: read: connection reset by peer

devbww avatar Oct 20 '23 00:10 devbww

FWIW: this seems to be:

https://github.com/GoogleContainerTools/kaniko/issues/1717

This may also be of use, but requires a lot more configuration:

https://cloud.google.com/artifact-registry/docs/repositories/remote-repo

coryan avatar Oct 20 '23 01:10 coryan

https://storage.googleapis.com/cloud-cpp-community-publiclogs/logs/google-cloud-cpp/main/7cb1ec3ee393fefe431bea87a872063c78f722cc/fedora-msan-msan/log-2e3dd95c-73bc-4a4c-a66c-6ee3a20c9201.txt

dbolduc avatar Oct 30 '23 21:10 dbolduc

https://storage.googleapis.com/cloud-cpp-community-publiclogs/logs/google-cloud-cpp/main/0cf8a7a77daca027399b1f590c1c1cd02ef2a4cb/fedora-latest-cmake-check-api/log-fd678bdb-61ce-4bef-9605-75e5db60582e.txt

error building image: error building stage: failed to execute command: extracting fs from image: unexpected EOF

dbolduc avatar Nov 02 '23 17:11 dbolduc

Attempting to fix upstream (https://github.com/GoogleContainerTools/kaniko/pull/2837)

alevenberg avatar Nov 09 '23 17:11 alevenberg

https://pantheon.corp.google.com/cloud-build/builds;region=us-east1/163e2878-c35f-4dfb-bfae-7770b55dc1d8;step=0?e=-13802955&mods=logs_tg_prod&project=cloud-cpp-testing-resources

coryan avatar Nov 17 '23 13:11 coryan

error building image: error building stage: failed to execute command: extracting fs from image: read tcp 192.168.10.2:60614->108.177.12.207:443: read: connection reset by peer
  • https://console.cloud.google.com/cloud-build/builds;region=us-east1/bd1d974c-8a94-4974-a6c6-13c7ea37051f;tab=detail?project=cloud-cpp-testing-resources
  • https://storage.googleapis.com/cloud-cpp-community-publiclogs/logs/google-cloud-cpp/main/3e2644a95cd2c40daf103ddcae7ab43d04548ae2/fedora-latest-bazel-asan/log-bd1d974c-8a94-4974-a6c6-13c7ea37051f.txt

dbolduc avatar Dec 12 '23 03:12 dbolduc

I tried :(

alevenberg avatar Dec 13 '23 15:12 alevenberg

https://console.cloud.google.com/cloud-build/builds;region=us-east1/81179ad8-943b-4d47-9f5b-e5ad5c7ebeea?project=cloud-cpp-testing-resources&pli=1&e=-13802955&mods=dataproc_env_prod

devbww avatar Dec 26 '23 19:12 devbww

CONSOLE_LOG_URL https://console.cloud.google.com/cloud-build/builds;region=us-east1/8f5cd35a-5c37-4d18-8f82-97283c00d633;tab=detail?project=cloud-cpp-testing-resources RAW_LOG_URL https://storage.googleapis.com/cloud-cpp-community-publiclogs/logs/google-cloud-cpp/main/94355b0efb9e087c241c2be3cca062e818d5a7dd/fedora-latest-bazel-tsan/log-8f5cd35a-5c37-4d18-8f82-97283c00d633.txt

dbolduc avatar Jan 31 '24 19:01 dbolduc

Step #0: error building image: error building stage: failed to execute command: extracting fs from image: read tcp 192.168.10.2:45946->74.125.196.207:443: read: connection reset by peer

Build FAILURE: libcxx-ci https://console.cloud.google.com/cloud-build/builds;region=us-east1/aea5600b-a3a6-4159-a043-3ae3d52d8dac?project=936212892354

https://ff61d7d2b82917f3c17eaeff0c86b71d3239f0c72a9c7b4acb459bf-apidata.googleusercontent.com/download/storage/v1/b/cloud-cpp-community-publiclogs/o/logs%2Fgoogle-cloud-cpp%2Fmain%2Fa988f3e39ca5134b6966578c3db5da07e1147156%2Ffedora-latest-bazel-libcxx-default%2Flog-aea5600b-a3a6-4159-a043-3ae3d52d8dac.txt?jk=ATxpoHeeaWtiteuzk_Hv4q9-IM7XMf4UI-FwcW3nzSl-qCF_T09zhJSr_WlG3yIzztYXmDjGnzEOuwJO3645otW7Tk0IFqhjohyC6X70Ww9uuVpgOE_C9AiMvxoM-12lPeUJmAWVgzFYLUJ_sbEGjmRnksixk2RozK04efDNH5GQiHCvwkfpXwj61Ex_YIuqmFt5ji7gB0HbVfDMzcBD8DZZpWzhoAPZzytpxVDoJj1UcwOvTUaApiNrREyaAzHnH5R5ilGLZn6x6Be4kW_LqLMQWxDOMcWVAFVntzW3E9WRjA3XuxNB8B7lUDvBC1POWwCH52Lfs_DhnrKqTL-O9GBQvKaicZ_ubfd7U2SSK3UOQIf4pxmBmSVbTiSUqCQebhTAwqBZio5MM6wjiytRtr9GkPu3Ld52JZvOPHcnwPkA7BFzw351pZfgh9SrQo2qAv2z58X8OdZua32RPvTBb3jXN3MZJ4ZL55eACl6ctTbNn3aXV0DiB6oiMKc7fvBGdX3RQhH6Qex0vFiCHEY9fq7xan_q1TVcRGuxRnYK322NsL33vqY2OIVxmpZBOpKvnMiZRMpaFp3zGlry-S-waYcXTaxIk_cfj8Hv857mA9aeG2GjMigIZxsXlBs6wGiuIxoY8Tg0r8cjN0QJ7q4Ub0K2B3m-vZM7dY3-6lAbk8lC9cpiAkKaiE9neOf8wOYVSPU1-gcXBqz6W7G2pa8nEuG5HTtSMOsWDhLGo3Prv0U3o7VMtkVyomZtf4ubtRjU3WPGwMEmDk9TdQJGkPKzs7UFqwzJwNeW5dbnU8EzgTN9s9S0TuxxqJaB2xg6_Q-73X8qZdXL7QwgiqI6OK7fDEgC2vD92tj47K_G-Ar5hsbFC3FSPHOXc7PQI1z92WjsuIWlzgaObFlAAslcLtk6q7Kpg36RHYJPEtRFFQcEQuC3PermC-EUQtsL5wOclZXTkh9VHc05qhcnI753DF5FOAbFOwytdoz2K_wYHyNc4TWUI9YZzuAhsDUseayDvlZEN_fm4n88kgFz-w2aGzPvH61SKblV_qttdslHvIHi881RKdFV7nkNghHjQaF97CPcipl6DhMqm6sW7Yl8bhguSIuktyAQ2xZosH-QpKo5HhGS8spa-k6FNvw7GOvG7EjEDYJvAKJL0PN4HQtKzKmTxjMYyq4OKcR1LgqpSpeREY_ne7vHBnhXIn1OBXOXoZ3cPuk9pR5M5ZAFvIfvzmmGDeTsnPx8V6nam8I1oBxyeeblL7PeTZzXLyFVx3STt0l3XCFvvU32eEkWahxrygI_FdAmg4URqy06igc88ofMTqjZN6MPDoF3IYNNzYOSEGdUvF2kGCV60r7Hm-zQ4V6jg3hGy-7gLUeNKsmzraqqAaIQ2iBoN9Rv7av6&isca=1

alevenberg avatar Feb 13 '24 16:02 alevenberg

https://console.cloud.google.com/cloud-build/builds;region=us-east1/93b44eb8-5bd9-4ca1-910f-6e8d5d0e09d7?project=936212892354

coryan avatar Feb 20 '24 19:02 coryan

https://pantheon.corp.google.com/cloud-build/builds;region=us-east1/f20dda71-8239-469b-b434-01a37df4349d?project=cloud-cpp-testing-resources&e=VertexAiColabPublicPreviewLaunch::VertexAiColabPublicPreviewEnabled&mods=logs_tg_prod

coryan avatar Feb 20 '24 22:02 coryan