bazelisk icon indicating copy to clipboard operation
bazelisk copied to clipboard

404 on last_green

Open KeithMoyer opened this issue 5 years ago • 2 comments

As part of our continuous integration tests, we used to build Bazel from HEAD in order to catch upcoming incompatible_... flag flips that impact us. Once we found Bazelisk, we changed the test to use it w/ USE_BAZEL_VERSION=last_green. However, we are sometimes seeing bazelisk get 404s when downloading Bazel.

Example from today:

2019/06/17 15:03:58 Using unreleased version at commit c84f7d39220c88dc44e9725df68805039917d8ed
2019/06/17 15:03:58 Downloading https://storage.googleapis.com/bazel-builds/artifacts/ubuntu1404/c84f7d39220c88dc44e9725df68805039917d8ed/bazel...
2019/06/17 15:03:58 could not download Bazel: HTTP GET https://storage.googleapis.com/bazel-builds/artifacts/ubuntu1404/c84f7d39220c88dc44e9725df68805039917d8ed/bazel failed with error 404

I see that the timing lines up such that a new green commit was created at around this time (times above are UTC), and c84f7d39220c88dc44e9725df68805039917d8ed may have just become not the "last green" commit as the test was running (and since Bazelisk caches for an hour, subsequent calls to bazelisk kept trying to pull down the build for this commit).

How long are "last green" builds kept available after a new green commit is added? Could this be what caused the issue for us?

KeithMoyer avatar Jun 17 '19 17:06 KeithMoyer

@KeithMoyer I can confirm that this happened a few times in the past. I'm looking into how to fix this reliably - we probably have to do this in the backend mechanism that updates the last green commit on CI. I'll update this bug when there is progress.

How long are "last green" builds kept available after a new green commit is added?

The last green builds are currently kept forever. However, there is a known race condition where the "last green" pointer can get updated before the binaries have been uploaded (because both happen in different pipelines at the moment).

philwo avatar Jun 19 '19 06:06 philwo

It's happening again with Bazel 3.1.0 for Linux ARM 64: https://releases.bazel.build/3.1.0/release/bazel-3.1.0-linux-arm64

I'm using Bazelisk version 1.7.3. I've also tried rolling back to 1.7.1 (by npm installing it) and the same issue persists. Is there any sort of workaround? It seems like no matter what version of Bazelisk is used, Bazelisk tries to install the latest Bazel, and there's no option otherwise.

jxnding avatar Jan 29 '21 05:01 jxnding

This also affects the CI of the Bazel Central Registry (last in https://buildkite.com/bazel/bcr-presubmit/builds/569).

It looks like Bazelisk is using the existence of a directory as the indicator for the availability of a particular version. Is it possible that the build job uploads the individual binaries into this location as they are built rather than atomically after all have been built? Could the build jobs upload into a temporary location and then trigger a final job that moves the bucket directory into the expected location atomically?

Alternatively, perhaps Bazelisk could check the contents of the directory for the particular binary it would download.

CC @meteorcloudy

fmeum avatar Oct 19 '22 06:10 fmeum

Our pipeline for publishing Bazel binaries had some issue during the process of adding Apple Silicon machines, now it's resolved.

meteorcloudy avatar Oct 19 '22 10:10 meteorcloudy