Flaky bazel internal crash `IllegalStateException: Not action: CppCompileActionTemplate` when using Skymeld
Description of the bug:
We've been getting a flaky bazel internal crash after upgrading to 7.2 from 6.4 that seems to be related to Skymeld and a TreeArtifact-based cc library (similar setup to #22886, but see below).
We see the following crash:
[22,990 / 25,056] checking cached actions
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'TargetCompletionKey{topLevelArtifactContext=com.google.devtools.build.lib.analysis.TopLevelArtifactContext@90904c3b, actionLookupKey=ConfiguredTargetKey{label=<top level general cc library target, not from generator>, config=BuildConfigurationKey[6de9c493725e885249a68bcd3cab225a7c98a12a462c2ead63bd885b18e247ba]}, willTest=false}' (requested by nodes 'BuildDriverKey of ActionLookupKey: ConfiguredTargetKey{label=<top level cc library target, not from generator>, config=BuildConfigurationKey[6de9c493725e885249a68bcd3cab225a7c98a12a462c2ead63bd885b18e247ba]}')
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:550)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:414)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.IllegalStateException: Not action: CppCompileActionTemplate compiling <bazel-out path of .cc from cc_library of generator> 0 RuleConfiguredTargetValue{actions=[CppCompileActionTemplate compiling <bazel-out path of .cc from cc_library of generator>, action '<path of .a from cc_library of generator>' (CppArchive[[File:[[<execution_root>]bazel-out/k8-dbg--cd/bin]<redacted>/_objs/redacted-cc-lib/redacted] -> [File:[[<execution_root>]bazel-out/k8-dbg--cd/bin]<redacted>/libredacted-cc-lib.a]])], configuredTarget=ConfiguredTarget(<cc library target from generator>, b75007340468b702430064e766d5f8f577cdff419d7ca8b572b796f7e9104d61)}
at com.google.devtools.build.lib.actions.ActionLookupValue.getAction(ActionLookupValue.java:34)
at com.google.devtools.build.lib.skyframe.ActionUtils.getActionForLookupData(ActionUtils.java:31)
at com.google.devtools.build.lib.skyframe.CompletionFunction.ensureToplevelArtifacts(CompletionFunction.java:393)
at com.google.devtools.build.lib.skyframe.CompletionFunction.compute(CompletionFunction.java:329)
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:461)
... 7 more
The crash is inconsistent. If we repeat the exact same build straight afterwards, it doesn't occur again (some sort of inconsistent state / race?). The CppCompileActionTemplate action that it is complaining about is always one of the cc_library targets created using the TreeArtifact-based generator, never any other target. The top level target is unrelated and can change, it is just a target with a (transitive) dependency to the generated cc_library.
Full generator setup:
def _generate_api_files_impl(ctx):
# We need to put the C++ files in a folder names like a C++ file to trick Bazel to accepting these folders as
# sources and header when creating a C++ library.
srcs_tree = ctx.actions.declare_directory(ctx.attr.name + ".cc")
hdrs_tree = ctx.actions.declare_directory(ctx.attr.name + ".hh")
java_tree = ctx.actions.declare_directory(ctx.attr.name + "-java-srcs")
ctx.actions.run(
executable = ctx.executable.generator,
outputs = [srcs_tree, hdrs_tree, java_tree],
arguments = [srcs_tree.path, hdrs_tree.path, java_tree.path],
)
srcjar = ctx.actions.declare_file(ctx.attr.name + ".srcjar")
create_srcjar_rule(ctx, java_tree, srcjar, ctx.executable._build_zip)
return [DefaultInfo(files = depset([srcs_tree, hdrs_tree, srcjar]))]
generate_api_files = rule(
implementation = _generate_api_files_impl,
attrs = {
"generator": attr.label(executable = True, cfg = "exec"),
"_build_zip": attr.label(default = Label(BUILD_ZIP_TOOL), cfg = "exec", executable = True),
},
)
def generate_api(name, generator):
generate_api_files(name = name, generator = generator)
cc_library(
name = name + "-cc-lib",
srcs = [name],
hdrs = [name],
)
java_library(
name = name + "-java-lib",
srcs = [
":" + name,
],
)
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Unfortunately we have been unable to consistently reproduce this yet. Setting --noexperimental_merged_skyframe_analysis_execution and we no longer see this crash after a week. Open to suggestions on trying to debug
Which operating system are you running Bazel on?
Rocky Linux 9.3
What is the output of bazel info release?
release 7.2.1
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse HEAD ?
No response
If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
cc @joeleba
We are experiencing the same issue. Is there any way we can help you to reproduce the issue?
After digging through the call stack for a bit, I think that it's possible that this regressed in https://github.com/bazelbuild/bazel/issues/20737, which suspiciously adds a Skymeld only code path that can end up invoking ActionUtils.getActionForLookupData on tree artifact children.
CC @coeuvre
@JohnnyMorganz , @fmeum has anybody of you followed up on this issue or maybe tested whether the error persists in newer versions?
Nothing further from our side, we still have --noexperimental_merged_skyframe_analysis_execution set and haven't investigated removing it yet since creating this issue. We are on Bazel 7.5.
This should be fixed by https://github.com/bazelbuild/bazel/commit/00bb86b01397d3ad6f3794077fea2958c06d817b which is in Bazel 9. Cherry-picking it into 8.5.0 may be possible if that helps.