bazel
bazel copied to clipboard
Tree artifact up-to-dateness check can be very slow for large tree artifacts
Description of the bug:
When checking whether a local action cache entry is up-to-date, it takes a long time to check actions that have large tree artifacts on their inputs. The stack trace when Bazel is working on this is:
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes([email protected]/Native Method)
at java.io.FileInputStream.read([email protected]/Unknown Source)
at com.google.common.io.ByteStreams.copy(ByteStreams.java:114)
at com.google.common.io.ByteSource.copyTo(ByteSource.java:257)
at com.google.common.io.ByteSource.hash(ByteSource.java:340)
at com.google.devtools.build.lib.vfs.FileSystem.getDigest(FileSystem.java:339)
at com.google.devtools.build.lib.unix.UnixFileSystem.getDigest(UnixFileSystem.java:452)
at com.google.devtools.build.lib.vfs.Path.getDigest(Path.java:690)
at com.google.devtools.build.lib.vfs.DigestUtils.manuallyComputeDigest(DigestUtils.java:194)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.constructFileArtifactValue(ActionMetada
taHandler.java:564)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.constructFileArtifactValueFromFilesyste
m(ActionMetadataHandler.java:496)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.lambda$constructTreeArtifactValueFromFi
lesystem$0(ActionMetadataHandler.java:354)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler$$Lambda$1121/0x0000000800857040.visit(Unknown Source)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:411)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:414)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue.visitTree(TreeArtifactValue.java:393)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.constructTreeArtifactValueFromFilesystem(ActionMetadataHandler.java:342)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.getTreeArtifactValue(ActionMetadataHandler.java:317)
at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.getMetadata(ActionMetadataHandler.java:265)
at com.google.devtools.build.lib.actions.ActionCacheChecker.getMetadataOrConstant(ActionCacheChecker.java:566)
at com.google.devtools.build.lib.actions.ActionCacheChecker.getMetadataMaybe(ActionCacheChecker.java:579)
at com.google.devtools.build.lib.actions.ActionCacheChecker.validateArtifacts(ActionCacheChecker.java:207)
at com.google.devtools.build.lib.actions.ActionCacheChecker.mustExecute(ActionCacheChecker.java:541)
My theory is that this is because the visitation happens on a single thread in TreeArtifactValue.visitTree()
when called from ActionMetadataHandler.constructTreeArtifactValueFromFilesystem()
.
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Take this BUILD file:
touch WORKSPACE
mkdir -p r
cat > r/BUILD <<'EOF'
load(":r.bzl", "r")
r(name = "ta")
genrule(
name = "c",
srcs = [":ta"],
outs = ["co"],
cmd = "find $(location :ta) > $@",
)
sh_binary(
name = "gen",
srcs = ["gen.sh"],
)
EOF
cat > r/r.bzl << 'EOF'
def _r_impl(ctx):
ta = ctx.actions.declare_directory("d")
ctx.actions.run(
outputs = [ta],
inputs = [],
executable = ctx.executable._gen,
arguments = [ta.path],
)
return [DefaultInfo(files = depset([ta]))]
r = rule(
implementation = _r_impl,
attrs = {
"_gen": attr.label(default = "//r:gen", executable = True, cfg = "exec"),
},
)
EOF
cat > r/gen.sh <<'EOF'
#!/bin/bash
OUT="$1"
mkdir -p "$OUT"
for i in $(seq 1 10); do
for j in $(seq 1 10); do
for k in $(seq 1 100); do
mkdir -p "$OUT/$i/$j"
#echo "$i $j $k" > "$OUT/$i/$j/$k"
dd if=/dev/random of="$OUT/$i/$j/$k" bs=1024 count=1024
done
done
done
echo hello > "$OUT/hello"
EOF
chmod +x r/gen.sh
bazel build //r:c
bazel shutdown
bazel build //r:c # This is slow
Which operating system are you running Bazel on?
Linux @ Google
What is the output of bazel info release
?
development version
If bazel info release
returns development version
or (@non-git)
, tell us how you built Bazel.
From git commit de4746da2394e49d028b5327ee5e65c6b60011cc .
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD
?
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
@coeuvre Might this be a possible application of Loom?
Perhaps io_uring might also help here?
@coeuvre Might this be a possible application of Loom?
Probably, by visiting the tree concurrently using virtual threads. It's not easy to do today because we have already created many platform threads.
Perhaps io_uring might also help here?
Definitely, but integrating io_uring into Bazel is another big project.
Created https://github.com/bazelbuild/bazel/pull/17943 as a prototype to fix this.
@bazel-io fork 6.2.0
We ran into this again. This time, the stack traces show:
"tree-artifact-visitor-46" #1800 [144416] daemon prio=5 os_prio=0 cpu=28027.44ms elapsed=12463.90s tid=0x00007f
b740001e70 nid=144416 runnable [0x00007fbb95efe000]
java.lang.Thread.State: RUNNABLE
at com.google.devtools.build.lib.vfs.inmemoryfs.InMemoryFileSystem.stat(InMemoryFileSystem.java:360)
at com.google.devtools.build.lib.remote.RemoteActionFileSystem.statUnchecked(RemoteActionFileSystem.java:612)
at com.google.devtools.build.lib.remote.RemoteActionFileSystem.resolveOneLink(RemoteActionFileSystem.java:533)
at com.google.devtools.build.lib.vfs.FileSystem.appendSegment(FileSystem.java:377)
at com.google.devtools.build.lib.vfs.FileSystem.resolveSymbolicLinks(FileSystem.java:445)
<the same like ten times>
at com.google.devtools.build.lib.vfs.FileSystem.resolveSymbolicLinks(FileSystem.java:445)
at com.google.devtools.build.lib.remote.RemoteActionFileSystem.statUnchecked(RemoteActionFileSystem.java:594)
at com.google.devtools.build.lib.remote.RemoteActionFileSystem.stat(RemoteActionFileSystem.java:582)
at com.google.devtools.build.lib.remote.RemoteActionFileSystem.statIfFound(RemoteActionFileSystem.java:564)
at com.google.devtools.build.lib.vfs.Path.statIfFound(Path.java:322)
at com.google.devtools.build.lib.skyframe.ActionOutputMetadataStore.fileArtifactValueFromArtifact(ActionOutputMetadataStore.java:599)
at com.google.devtools.build.lib.skyframe.ActionOutputMetadataStore.constructFileArtifactValue(ActionOutputMetadataStore.java:461)
at com.google.devtools.build.lib.skyframe.ActionOutputMetadataStore.constructFileArtifactValueFromFilesystem(ActionOutputMetadataStore.java:449)
at com.google.devtools.build.lib.skyframe.ActionOutputMetadataStore.lambda$constructTreeArtifactValueFromFilesystem$0(ActionOutputMetadataStore.java:298)
at com.google.devtools.build.lib.skyframe.ActionOutputMetadataStore$$Lambda/0x00000008009c39f0.visit(Unknown Source)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue$Visitor.visitTree(TreeArtifactValue.java:548)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue$Visitor.lambda$visitTree$1(TreeArtifactValue.java:551)
at com.google.devtools.build.lib.skyframe.TreeArtifactValue$Visitor$$Lambda/0x00000008009c4508.run(Unknown Source)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:414)
Which is indicative of a RemoteActionFileSystem
being slow, even when called on multiple threads.
I believe there are two distinct problems here:
-
RemoteActionFileSystem#resolveSymbolicLinks
is slow because it stats paths on the underlying filesystem one component at a time and there's no cache. So if you have a tree artifact with N files and every file has M path components, collecting the TreeArtifactValue incurs in O(N*M) stats. Adding a cache is possible, but the invalidation logic is nontrivial (we cannot assume that a path statted in the past won't change in the future, because files can be deleted or moved around within an action execution). -
TreeArtifactValue.visitTree
parallelizes over subdirectories, not individual files. So if the tree artifact has a very large subdirectory with N files (or a single flat directory, in the worst case), collecting the TreeArtifactValue is still as slow as O(N) sequential stats.
I recently did some measurements on a large tree artifact containing 240k files totaling 12GB and large subdirectories with ~4k files (internal b/323077002 has the details) and I determined the following:
- Currently,
TreeArtifactValue.visitTree
takes ~40s (~160us/file) - Adding a "dentry cache" for resolveSymbolicLinks brings it down to ~9s (disclaimer: prototype quality, doing it properly with correct invalidation might be slower)
- Parallelizing across files instead of subdirectories (in addition to the dentry cache) brings it down to ~6s (~25us/file)
- However, parallelizing across files does nothing on its own (it only compounds with the other optimization)
After applying both optimizations, the trace profile shows that visitTree
is able to use all of the available cores during most of its execution, suggesting that they're sufficient to avoid blocking on I/O; further gains might be possible on the CPU side, but I haven't looked into it closely.
This is on a 32-core machine with an SSD, so numbers may vary depending on available parallelism and I/O performance.
I think this has now been optimized as reasonably as it can.