bazel icon indicating copy to clipboard operation
bazel copied to clipboard

5.0rc1: Remote cache read failure "java.lang.RuntimeException: java.io.IOException: file.tmp (Not a directory)"

Open brentleyjones opened this issue 2 years ago • 7 comments

Description of the problem / feature request:

We encountered an error where bazel fails to read from the remote cache because it throws an execution locally. Seems to be from this code path: https://github.com/bazelbuild/bazel/commit/cff2ea55dafc57d7d2db813522942e2ee137ed08#diff-c0c72bba110eca25447686392b1f265feabe77c8a53ed7387f70f82bcdaba441R112.

WARNING: Reading from Remote Cache:
--
  | com.google.devtools.build.lib.remote.common.BulkTransferException: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
  | at com.google.devtools.build.lib.remote.RemoteCache.waitForBulkTransfer(RemoteCache.java:195)
  | at com.google.devtools.build.lib.remote.RemoteExecutionService.downloadOutputs(RemoteExecutionService.java:971)
  | at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:124)
  | at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:141)
  | at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:108)
  | at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
  | at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:68)
  | at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:328)
  | at com.google.devtools.build.lib.actions.Action.execute(Action.java:134)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:909)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1078)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1033)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:496)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
  | at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
  | at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
  | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  | at java.base/java.lang.Thread.run(Unknown Source)
  | Suppressed: java.io.IOException: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
  | at com.google.devtools.build.lib.remote.GrpcCacheClient.lambda$downloadBlob$13(GrpcCacheClient.java:320)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:192)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:179)
  | at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:124)
  | at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
  | at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
  | at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
  | at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:814)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:203)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:179)
  | at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:133)
  | at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
  | at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
  | at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
  | at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:814)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:203)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:179)
  | at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:133)
  | at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
  | at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
  | at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
  | at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
  | at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:53)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onError(GrpcCacheClient.java:376)
  | at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
  | at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
  | at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
  | at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
  | at com.google.devtools.build.lib.remote.NetworkTimeInterceptor$NetworkTimeCall$1.onClose(NetworkTimeInterceptor.java:81)
  | at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
  | at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
  | at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
  | at com.google.devtools.build.lib.remote.ReferenceCountedChannel$ConnectionCleanupCall$1.onClose(ReferenceCountedChannel.java:90)
  | at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
  | at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
  | at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
  | at com.google.devtools.build.lib.remote.logging.LoggingInterceptor$LoggingForwardingCall$1.onClose(LoggingInterceptor.java:157)
  | at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:557)
  | at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:69)
  | at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:738)
  | at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:717)
  | at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
  | at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
  | ... 3 more
  | Suppressed: java.io.IOException: /private/var/tmp/_bazel_iosci/7c81b0540945e51fddc7bd01ee8d0abb/execroot/lyftios/bazel-out/darwin-fastbuild/bin/tools/CodeAnalysis/Tests/LinuxMain.swift.tmp (Not a directory)
  | at com.google.devtools.build.lib.unix.NativePosixFiles.openWrite(Native Method)
  | at com.google.devtools.build.lib.unix.UnixFileSystem.createFileOutputStream(UnixFileSystem.java:493)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:174)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:188)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:425)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:413)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.ensureOpen(LazyFileOutputStream.java:66)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.close(LazyFileOutputStream.java:60)
  | at com.google.devtools.build.lib.remote.RemoteCache$ReportingOutputStream.close(RemoteCache.java:527)
  | at com.google.devtools.build.lib.remote.RemoteCache$3.onFailure(RemoteCache.java:400)
  | at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1066)
  | at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
  | at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
  | at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
  | at com.google.common.util.concurrent.AbstractFuture.setFuture(AbstractFuture.java:814)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:203)
  | at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.setResult(AbstractCatchingFuture.java:179)
  | at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:133)
  | ... 42 more
  | Caused by: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
  | at io.grpc.Status.asRuntimeException(Status.java:535)
  | ... 22 more
  | Caused by: java.lang.RuntimeException: java.io.IOException: /private/var/tmp/_bazel_iosci/7c81b0540945e51fddc7bd01ee8d0abb/execroot/lyftios/bazel-out/darwin-fastbuild/bin/tools/CodeAnalysis/Tests/LinuxMain.swift.tmp (Not a directory)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:356)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:347)
  | at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:465)
  | at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
  | at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
  | at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
  | at com.google.devtools.build.lib.remote.logging.LoggingInterceptor$LoggingForwardingCall$1.onMessage(LoggingInterceptor.java:138)
  | at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:656)
  | at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:641)
  | ... 5 more
  | Caused by: java.io.IOException: /private/var/tmp/_bazel_iosci/7c81b0540945e51fddc7bd01ee8d0abb/execroot/lyftios/bazel-out/darwin-fastbuild/bin/tools/CodeAnalysis/Tests/LinuxMain.swift.tmp (Not a directory)
  | at com.google.devtools.build.lib.unix.NativePosixFiles.openWrite(Native Method)
  | at com.google.devtools.build.lib.unix.UnixFileSystem.createFileOutputStream(UnixFileSystem.java:493)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:174)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:188)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:425)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:413)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.ensureOpen(LazyFileOutputStream.java:66)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.write(LazyFileOutputStream.java:42)
  | at com.google.devtools.build.lib.remote.RemoteCache$ReportingOutputStream.write(RemoteCache.java:510)
  | at com.google.devtools.build.lib.remote.util.DigestOutputStream.write(DigestOutputStream.java:58)
  | at java.base/java.io.FilterOutputStream.write(Unknown Source)
  | at com.google.protobuf.ByteString$LiteralByteString.writeTo(ByteString.java:1381)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:352)
  | ... 13 more

LinuxMain.swift is single file output from a genrule.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I don't know of one.

What operating system are you running Bazel on?

macOS 11.6.1

What's the output of bazel info release?

release 5.0.0rc1

brentleyjones avatar Nov 04 '21 19:11 brentleyjones

According to the stack trace:

 java.io.IOException: /private/var/tmp/_bazel_iosci/7c81b0540945e51fddc7bd01ee8d0abb/execroot/lyftios/bazel-out/darwin-fastbuild/bin/tools/CodeAnalysis/Tests/LinuxMain.swift.tmp (Not a directory)
  | at com.google.devtools.build.lib.unix.NativePosixFiles.openWrite(Native Method)
  | at com.google.devtools.build.lib.unix.UnixFileSystem.createFileOutputStream(UnixFileSystem.java:493)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:174)
  | at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:188)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:425)
  | at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:413)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.ensureOpen(LazyFileOutputStream.java:66)
  | at com.google.devtools.build.lib.remote.common.LazyFileOutputStream.write(LazyFileOutputStream.java:42)
  | at com.google.devtools.build.lib.remote.RemoteCache$ReportingOutputStream.write(RemoteCache.java:510)
  | at com.google.devtools.build.lib.remote.util.DigestOutputStream.write(DigestOutputStream.java:58)
  | at java.base/java.io.FilterOutputStream.write(Unknown Source)
  | at com.google.protobuf.ByteString$LiteralByteString.writeTo(ByteString.java:1381)
  | at com.google.devtools.build.lib.remote.GrpcCacheClient$1.onNext(GrpcCacheClient.java:352)
  | ... 13 more

it calls into NativePosixFiles.openWrite() which calls open which throws "Not a directory" or ENOTDIR.

According to the doc, ENOTDIR means "A component of the path prefix is not a directory". So I am guessing the issue is the output directory is not prepared probably for the action.

coeuvre avatar Nov 05 '21 04:11 coeuvre

Might related https://github.com/bazelbuild/bazel/issues/6393.

coeuvre avatar Nov 05 '21 04:11 coeuvre

@brentleyjones Is that possible to create a repro?

coeuvre avatar Nov 05 '21 04:11 coeuvre

It's weird, the parent directories are actually created before the download.

coeuvre avatar Nov 05 '21 07:11 coeuvre

It was on CI. All I can think of is the directories failed to be created, because of low disk space.

brentleyjones avatar Nov 05 '21 11:11 brentleyjones

We're seeing what I think is the same issue with bazel 5.1.1:

[1,147 / 1,149] Compiling test/extensions/network/dns_resolver/apple/apple_dns_impl_test.cc; 1s remote-cache
ERROR: /Users/runner/work/1/s/test/extensions/network/dns_resolver/apple/BUILD:11:14: Compiling test/extensions/network/dns_resolver/apple/apple_dns_impl_test.cc failed: Exec failed due to IOException: 41 errors during bulk transfer:
java.io.IOException: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
java.io.IOException: io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.

Unfortunately I think we're hiding the full stack trace https://dev.azure.com/cncf/envoy/_build/results?buildId=106443&view=logs&jobId=fa3d3e18-6969-5713-c3e7-3581195704fd&j=fa3d3e18-6969-5713-c3e7-3581195704fd&t=ba370bef-ca49-52f5-0c93-7c0c0f27c465 (this log will probably rotate)

keith avatar Apr 19 '22 21:04 keith

Hit this bug today when trying to use

build --strategy=GoLink=linux-sandbox,remote
build --strategy=GoCompile=linux-sandbox,remote

so that I could do GoLinking locally while doing the rest of my build on RBE.

Not sure if that information is helpful

mrmeku avatar Aug 04 '22 21:08 mrmeku

A lot of things have changed since 5. Please reopen if it's still happening.

coeuvre avatar Oct 20 '23 13:10 coeuvre