bazel
bazel copied to clipboard
Remote cache warning when using `--experimental_remote_cache_async`
Description of the bug:
When running tests with the remote cache enabled and passing --experimental_remote_cache_async I often see this warning:
WARNING: Remote Cache: .../1/test.err (No such file or directory)
WARNING: Remote Cache: .../2/test.err (No such file or directory)
WARNING: Remote Cache: .../3/test.err (No such file or directory)
WARNING: Remote Cache: .../4/test.err (No such file or directory)
WARNING: Remote Cache: .../5/test.err (No such file or directory)
Where the granularity is per test target. I don't think it shows for every test target but I can see dozens in a single build testing ~600 targets.
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
No response
Which operating system are you running Bazel on?
Linux
What is the output of bazel info release?
7.1.1
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse HEAD ?
No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
there's actually a stack trace in this case too, i'll try to copy it next time
Please share the stack trace!
I think on 7.1.1 it doesn't actually show a trace in all cases
@keith Does it do so with --verbose_failures?
Yep, nice call!
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(Unknown Source)
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at com.google.devtools.build.lib.unix.UnixFileSystem.createFileInputStream(UnixFileSystem.java:497)
at com.google.devtools.build.lib.vfs.AbstractFileSystem.createMaybeProfiledInputStream(AbstractFileSystem.java:96)
at com.google.devtools.build.lib.vfs.AbstractFileSystem.getInputStream(AbstractFileSystem.java:59)
at com.google.devtools.build.lib.vfs.FileSystem$1.openStream(FileSystem.java:355)
at com.google.common.io.ByteSource.copyTo(ByteSource.java:256)
at com.google.common.io.ByteSource.hash(ByteSource.java:340)
at com.google.devtools.build.lib.vfs.FileSystem.getDigest(FileSystem.java:357)
at com.google.devtools.build.lib.unix.UnixFileSystem.getDigest(UnixFileSystem.java:453)
at com.google.devtools.build.lib.vfs.Path.getDigest(Path.java:690)
at com.google.devtools.build.lib.remote.RemoteActionFileSystem.getDigest(RemoteActionFileSystem.java:453)
at com.google.devtools.build.lib.vfs.Path.getDigest(Path.java:690)
at com.google.devtools.build.lib.vfs.DigestUtils.manuallyComputeDigest(DigestUtils.java:220)
at com.google.devtools.build.lib.vfs.DigestUtils.getDigestWithManualFallback(DigestUtils.java:183)
at com.google.devtools.build.lib.remote.util.DigestUtil.compute(DigestUtil.java:91)
at com.google.devtools.build.lib.remote.util.DigestUtil.compute(DigestUtil.java:80)
at com.google.devtools.build.lib.remote.UploadManifest.setStdoutStderr(UploadManifest.java:202)
at com.google.devtools.build.lib.remote.UploadManifest.create(UploadManifest.java:143)
at com.google.devtools.build.lib.remote.RemoteExecutionService.lambda$buildUploadManifestAsync$6(RemoteExecutionService.java:1344)
at io.reactivex.rxjava3.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:43)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap.subscribeActual(SingleFlatMap.java:37)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleUsing.subscribeActual(SingleUsing.java:83)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleSubscribeOn$SubscribeOnObserver.run(SingleSubscribeOn.java:89)
at io.reactivex.rxjava3.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:38)
at io.reactivex.rxjava3.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:25)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
This is trivial to repro. The issue is this logic in StandaloneTestStrategy that appends the test stderr file into the stdout file and then deletes the former. With --experimental_remote_cache_async, this logic runs concurrently with the uploading of the test spawn, and might find that the stderr file has already been deleted.
This is an instance of a more general problem that, in order for async uploading to work correctly, we must uphold the invariant that the outputs of a spawn must never be deleted for the remainder of the execution (or at least until uploading has taken place).
In addition, the test.err files are guaranteed to always be empty, because test-setup.sh redirects stderr to stdout. I believe the merge logic is just a last resort in case the test was set up incorrectly.
...except on Windows, where the test wrapper doesn't redirect. Sigh.
We're seeing this problem as well, and running tests on Windows becomes very flaky with this - due to Windows advanced (ahem) open-files-are-locked-from-deletion semantics, this can even cause build errors (one code path tries to delete the stderr files, the other tries to read them to upload, so the deletion fails).
Funnily, this even happens when using the disk cache, not just a "proper" remote cache.
I guess the workaround for this is
common --experimental_remote_cache_async
test --noexperimental_remote_cache_async
in .bazelrc?
I'm also not sure why this issue is marked as "more data needed" - anything we can help with?
A fix for this issue has been included in Bazel 7.4.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.4.0rc1. Thanks!