bazel icon indicating copy to clipboard operation
bazel copied to clipboard

`--experimental_remote_cache_eviction_retries` doesn't evict all metadata causing build failure

Open BalestraPatrick opened this issue 1 year ago • 13 comments

Description of the bug:

Hello!

We have used --experimental_remote_cache_eviction_retries=1 with --remote_download_toplevel during the last few weeks, and we noticed from our data that in some cases, when Bazel exits with exit code 39, the following invocation will fail with the exact same error. This is not always the case (we have seen it correctly recover in some situations), but for one specific CppArchive action, we see it failing in about 1% of our builds.

The logs look like the following:

# first invocation
ERROR: /Users/path/BUILD.bazel:122:16: Linking fl.a failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: 6c23fb8f89c7a10035172c6f6fcadcd37a58078ac7f141bce745ee4dee40899c/159248
...
Found remote cache eviction error, retrying the build...
...
# second invocation
ERROR: /Users/path/BUILD.bazel:122:16: Linking fl.a failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: 6c23fb8f89c7a10035172c6f6fcadcd37a58078ac7f141bce745ee4dee40899c/159248

The only way to recover from this failure seems to be to run bazel clean or disable BwtB (--remote_download_toplevel).

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

In the gRPC log for the second invocation, we see the following:

---------------------------------------------------------
metadata {
  tool_details {
    tool_name: "bazel"
    tool_version: "6.1.0-7ccf98d" <- don't mind this, we build Bazel from source from our own fork and we currently pass the wrong version even though it's based off the 6.2.0 branch.
  }
  action_id: "7d45850557079c80512a065d2250586c4ac53cd91a468cc137e1ca59b3e1fbaa"
  tool_invocation_id: "007220b0-22de-401e-a407-5d6dd56225bf"
  correlated_invocations_id: "cdaeafdf-8dde-44e0-94a6-6dbc3c47c3a6"
  action_mnemonic: "CppArchive"
  target_id: "//path:lib"
  configuration_id: "702272d6c0c9a184ce867b340aaf98a381dfa7e5f316a7a64e42777e966f1345"
}
status {
  code: 1
  message: "com.google.devtools.build.lib.remote.common.CacheNotFoundException: Missing digest: 6c23fb8f89c7a10035172c6f6fcadcd37a58078ac7f141bce745ee4dee40899c/159248"
}
method_name: "google.bytestream.ByteStream/Write"
details {
  write {
  }
}
start_time {
  seconds: 1686831469
  nanos: 922000000
}
end_time {
  seconds: 1686831469
  nanos: 943000000
}

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

6.2.0 @ 286306e

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

BalestraPatrick avatar Jun 15 '23 15:06 BalestraPatrick