bazel
bazel copied to clipboard
Skyframe invalidation with disk or remote cache
Description of the bug:
When using a disk cache (possibly also a remote cache), Bazel repeatedly marks output files obtained from the cache as dirty to trigger a download, but then doesn't download the files, so they are still missing on the next invocation.
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
- Patch Bazel with the diff further below.
- Repeatedly run
baze --nosystem_rc --nohome_rc build --disk_cache=some_path //...on some project with Java targets.
See lines such as:
artifact is dirty: File:[[<execution_root>]bazel-out/darwin_arm64-fastbuild/bin]jeydtzir/libjeydtzir-hjar.jdeps RemoteFileArtifactValueWithExpiration{digest=0x7FACF6298A30C6CB373A988E3A3683CD3E8E9C46C59ECC570F089867234A3FEF, size=23, locationIndex=1, materializationExecPath=null, expireAtEpochMilli=1715713661059}
Invalidating: [ActionLookupData1{actionLookupKey=ConfiguredTargetKey{label=//vkprdlnk:vkprdlnk, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=1}, ActionLookupData0{actionLookupKey=ConfiguredTargetKey{label=//wztejdxc:wztejdxc, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=0}, ActionLookupData0{actionLookupKey=ConfiguredTargetKey{label=@@rules_java~//toolchains:platformclasspath, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=0}, ActionLookupData1{actionLookupKey=ConfiguredTargetKey{label=//jeydtzir:jeydtzir, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=1}, ActionLookupData0{actionLookupKey=ConfiguredTargetKey{label=//vkprdlnk:vkprdlnk, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=0}, ActionLookupData1{actionLookupKey=ConfiguredTargetKey{label=//mynbiqpm:mynbiqpm, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=1}, ActionLookupData1{actionLookupKey=ConfiguredTargetKey{label=//zjplsgqe:zjplsgqe, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=1}, ActionLookupData1{actionLookupKey=ConfiguredTargetKey{label=@@rules_java~//toolchains:platformclasspath, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=1}, ActionLookupData0{actionLookupKey=ConfiguredTargetKey{label=//jeydtzir:jeydtzir, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=0}, ActionLookupData1{actionLookupKey=ConfiguredTargetKey{label=//wztejdxc:wztejdxc, config=BuildConfigurationKey[b85621dae172f731976165fe86631f99030f401e70ed2b02d58c69595e5ea36d]}, actionIndex=1}] com.google.devtools.build.skyframe.InvalidatingNodeVisitor$DirtyingInvalidationState@104d38f9
Which operating system are you running Bazel on?
Linux
What is the output of bazel info release?
No response
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse HEAD ?
No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
diff --git a/src/main/java/com/google/devtools/build/lib/skyframe/FilesystemValueChecker.java b/src/main/java/com/google/devtools/build/lib/skyframe/FilesystemValueChecker.java
index 60a08ddf68..363c5291af 100644
--- a/src/main/java/com/google/devtools/build/lib/skyframe/FilesystemValueChecker.java
+++ b/src/main/java/com/google/devtools/build/lib/skyframe/FilesystemValueChecker.java
@@ -539,6 +539,7 @@ public class FilesystemValueChecker {
for (Map.Entry<Artifact, FileArtifactValue> entry : actionValue.getAllFileValues().entrySet()) {
if (artifactIsDirtyWithDirectSystemCalls(
knownModifiedOutputFiles, remoteArtifactChecker, entry, modifiedOutputsReceiver)) {
+ System.err.println("artifact is dirty: " + entry.getKey() + " " + entry.getValue());
isDirty = true;
}
}
@@ -551,6 +552,8 @@ public class FilesystemValueChecker {
tree.getChildValues().entrySet()) {
if (artifactIsDirtyWithDirectSystemCalls(
knownModifiedOutputFiles, remoteArtifactChecker, childEntry, modifiedOutputsReceiver)) {
+ System.err.println(
+ "tree artifact entry is dirty: " + childEntry.getKey() + " " + childEntry.getValue());
isDirty = true;
}
}
@@ -572,6 +575,7 @@ public class FilesystemValueChecker {
if (shouldCheckTreeArtifact(sortedKnownModifiedOutputFiles.get(), treeArtifact)
&& treeArtifactIsDirty(treeArtifact, entry.getValue())) {
// Count the changed directory as one "file".
+ System.err.println("tree artifact entry is dirty: " + treeArtifact);
modifiedOutputsReceiver.reportModifiedOutputFile(
getBestEffortModifiedTime(treeArtifact.getPath()), treeArtifact);
isDirty = true;
diff --git a/src/main/java/com/google/devtools/build/skyframe/EagerInvalidator.java b/src/main/java/com/google/devtools/build/skyframe/EagerInvalidator.java
index f8f2c0d560..07988108b3 100644
--- a/src/main/java/com/google/devtools/build/skyframe/EagerInvalidator.java
+++ b/src/main/java/com/google/devtools/build/skyframe/EagerInvalidator.java
@@ -82,6 +82,7 @@ public final class EagerInvalidator {
DirtyAndInflightTrackingProgressReceiver progressReceiver,
InvalidationState state)
throws InterruptedException {
+ System.err.println("Invalidating: " + diff + " " + state);
DirtyingNodeVisitor visitor =
createInvalidatingVisitorIfNeeded(graph, diff, progressReceiver, state);
if (visitor != null) {
cc @tjgq @coeuvre
~~Probably this: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/skyframe/FilesystemValueChecker.java;l=121;drc=e03de754ec15096fcd1f701ea0b2d4c09c4b97a1~~
Can you repro with --noexperimental_merged_skyframe_analysis_execution?
so they are still missing on the next invocation.
Are these outputs should be downloaded (e.g. toplevel outputs, or specified by --remote_download_regex)?
If Bazel marks an action dirty to trigger a download but then doesn't download it because it shouldn't, then it's a performance issue. Otherwise, it's a correctness issue.
Can you repro with
--noexperimental_merged_skyframe_analysis_execution?
I can't.
Are these outputs should be downloaded (e.g. toplevel outputs, or specified by
--remote_download_regex)?
No, these outputs should not be downloaded (just hjars, jdeps files, ...), so I think this is just a performance issue.
This is the root cause: https://github.com/bazelbuild/bazel/blob/43ad74bec433c1923e2ce78605ea04cac0cdb324/src/main/java/com/google/devtools/build/lib/buildtool/ExecutionTool.java#L328-L337
Note the RemoteArtifactChecker.IGNORE_ALL, which means that all remote artifacts will be marked as dirty. Happy to work on this, are there any ideas of what would need to be done to improve this? CC @joeleba
skymeld calls skyframeExecutor.detectModifiedOutputFiles when the first toplevel target is analyzed. However, the RemoteArtifactChecker for BwoB needs full analyze result to be able to correctly determine which action should be marked as dirty because it records the path of toplevel outputs in a trie: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/remote/RemoteOutputChecker.java;l=128;drc=6f48f1c2b3bb73768b9ff15ce6698e21eddc503a (there are some skymeld only code, but is for clean build). Otherwise, actions for other toplevel targets will not be invalidated. So we decided to use RemoteArtifactChecker.IGNORE_ALL for now.
One potential solution I have discussed with @joeleba offline is to let skymeld calls skyframeExecutor.detectModifiedOutputFiles for each toplevel target so that the RemoteArtifactChecker.IGNORE_ALL can be replaced with the one from BowB.
This is the root cause:
Argh, I intended to post this link but somehow my clipboard is corrupted.
This is a known limitation of Skymeld + BwoB. It's a tradeoff essentially: having both skymeld + bwob improves clean builds' performance, but comes with a performance penalty for incremental builds. Luckily the penalty only comes from repeating the skyframe work, since the have other layers of caches for actions.
In the current state, detectModifiedOutputFiles is called once before any action execution in the build. At this point, it's not possible to construct the full picture of what's dirty or not in the RemoteArtifactChecker (only available after the full analysis).
To resolve this, we would need to do detectModifiedOutputFiles incrementally, as the information for each top level target/aspect becomes available.
We reproduced this with just the remote cache. Can this have a higher than P3 priority?
Raising it to P2 but we probably won't have time to work on the fix until next month or two. In the mean while, PR is welcome!
Friendly ping on this.
I have a proper fix and should be able to submit it in a few days.
@bazel-io fork 7.4.0
This is fixed both in master and release-7.4.0 branches.
@brentleyjones Can you verify whether it improves your builds?
Thanks! I'll have to wait for an RC to start validation.
A fix for this issue has been included in Bazel 7.4.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.4.0rc1. Thanks!