bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Bazel 7 unable to finalize action due to missing digest for `.d` files when `--experimental_inmemory_dotd_files` is set.

Open luispadron opened this issue 9 months ago • 2 comments

Description of the bug:

When using --experimental_inmemory_dotd_files which seems to be the default, at least in Bazel 7, the .d file actions fail with a missing digest error.

ERROR: Foo/BUILD.bazel:11:15: Compiling Foo.c failed: unable to finalize action: Missing digest: <number>/<number> for bazel-out/ios_arm64-opt-ios-arm64-min12.0-applebin_ios-ST-<sha>/bin/path/to/Foo.d

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I haven't found a way to consistently reproduce this locally, our CI machines which are configured to:

  • Don't use a disk cache
  • Don't use remote execution
  • Use a remote cache

Failed several times in our Bazel 7 testing, after setting --noexperimental_inmemory_dotd_files we no longer saw this issue.

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

release 7.1.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Yes, we've never seen this issue in Bazel 6 and not much has changed in our Bazel 7 testing in terms of flags.

Have you found anything relevant by searching the web?

  • https://github.com/bazelbuild/bazel/issues/20123 is potentially related but not exactly the same

Any other information, logs, or outputs that you want to share?

No response

luispadron avatar May 15 '24 21:05 luispadron

We looked for something that could go wrong with the combination of --remote_cache, --remote_download_all and --experimental_inmemory_dotd_files, but don't have a plausible theory yet (other than the remote cache spuriously evicting blobs - but that doesn't explain why it only happens with .d files, and only when in-memory outputs are enabled).

Can you provide the following information:

  • The complete list of Bazel flags you're using
  • The remote cache implementation you're using
  • The --experimental_remote_grpc_log for one of the failed invocations (feel free to scrub sensitive data but please preserve the digests, or rewrite them in such a way that they match up between gprc requests)

In addition, it would be helpful to know the following:

  • Can you repro this against a disk cache, or a different remote cache implementation? (e.g. a simple HTTP cache that is guaranteed to never evict any blobs on its own)

tjgq avatar May 28 '24 12:05 tjgq

Thanks for investigating @tjgq

I can provide the first two now and look at the execution log when I get a chance:

  • The --announce_rc logs for our flags in CI:
INFO: Invocation ID: <ID>
INFO: Reading 'startup' options from /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --host_jvm_args=-Djavax.net.ssl.trustStore=Configuration/Java.cacerts, --host_jvm_args=-Djavax.net.ssl.trustStorePassword=changeit, --host_jvm_args=-DBAZEL_TRACK_SOURCE_DIRECTORIES=1, --max_idle_secs=86400, --digest_function=blake3
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc:
  Inherited 'common' options: --remote_header=hashfn=blake3 --lockfile_mode=update --incompatible_disallow_empty_glob --experimental_repository_downloader_retries=3 --incompatible_strict_action_env=true --spawn_strategy=local --verbose_failures --test_output=errors --max_config_changes_to_show=-1 --attempt_to_print_relative_paths --experimental_inprocess_symlink_creation --keep_going --output_filter=^//.*:((?!(SwiftLintCore|SwiftLintBuiltInRules).*).)*$ --noexperimental_inmemory_dotd_files --compilation_mode=dbg --@build_bazel_rules_swift//swift:copt=-whole-module-optimization --@build_bazel_rules_swift//swift:exec_copt=-whole-module-optimization --@rules_xcodeproj//xcodeproj:extra_common_flags=--//Bazel:is_building_in_xcode=0 --features=swift.emit_symbol_graph_extension_blocks --action_env=CACHE_EPOCH=4 --remote_download_outputs=all --config=cache_cdn_read --noremote_upload_local_results --remote_local_fallback --experimental_remote_merkle_tree_cache --experimental_guard_against_concurrent_changes --disk_cache=~/Library/Caches/bazel-cash-ios-cache --remote_build_event_upload=minimal --nolegacy_important_outputs --modify_execution_info=^(AppleLipo|BitcodeSymbolsCopy|BundleApp|BundleTreeApp|DsymDwarf|DsymLipo|GenerateAppleSymbolsFile|ObjcBinarySymbolStrip|CppArchive|CppLink|ObjcLink|ProcessAndSign|SignBinary|SwiftArchive|SwiftStdlibCopy|PackagingFramework.+|ExtendModulemap|HmapCreate)$=+no-remote,^(BundleResources|ImportedDynamicFrameworkProcessor)$=+no-remote-exec --remote_cache_compression=true --xcode_version_config=//Bazel:host_xcodes --macos_minimum_os=13.0 --host_macos_minimum_os=13.0 --config virtual_frameworks --features=-swift.vfsoverlay --@build_bazel_rules_apple//apple/build_settings:use_tree_artifacts_outputs=true --define=apple.incompatible.objc_framework_propagate_modulemap=true
INFO: Reading rc options for 'build' from /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc:
  'build' options: --flag_alias=build_config=//Bazel:build_config --flag_alias=release_variant=//Bazel:release_variant --flag_alias=xcscheme=//Bazel/apple/xcschemes:xcscheme
INFO: Found applicable config definition common:cache_cdn_read in file /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --remote_cache=<REDACTED>
INFO: Found applicable config definition common:virtual_frameworks in file /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --features apple.virtualize_frameworks
INFO: Found applicable config definition common:ci in file /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --remote_upload_local_results --build_metadata=ROLE=CI --announce_rc --color=no --curses=no --noshow_loading_progress --show_progress_rate_limit=15.0 --progress_report_interval=60 --disk_cache=
INFO: Found applicable config definition common:cache_grpc in file /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --remote_cache=grpcs://bazel-remote-vpce-service-privatelink.squarecloudservices.com --experimental_remote_cache_async=true
INFO: Found applicable config definition common:ios_release in file /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --config=release --ios_multi_cpus=arm64 --@build_bazel_rules_apple//apple/build_settings:use_tree_artifacts_outputs=false --config=generate_dsym --objc_enable_binary_stripping --define=apple.trim_lproj_locales=yes --features=dead_strip --features=swift.opt_uses_wmo --@build_bazel_rules_swift//swift:copt=-Xfrontend --@build_bazel_rules_swift//swift:copt=-internalize-at-link
INFO: Found applicable config definition common:release in file /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --build_config=release --compilation_mode=opt --//Pods/cocoapods-bazel:config=release --//Pods/cocoapods-bazel:deps_config=deps_release
INFO: Found applicable config definition common:generate_dsym in file /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --apple_generate_dsym --output_groups=+dsyms
INFO: Found applicable config definition common:alpha in file /Users/build/.jenkins/workspace/cash-ios/ios-builder/s/c/.bazelrc: --release_variant=alpha
  • The remote cache implementation we're using: https://github.com/bazel-ios/bazel-buildfarm/tree/bazel-ios-fork

luispadron avatar May 28 '24 20:05 luispadron

@luispadron Can you provide the --experimental_remote_grpc_log for a build exhibiting this failure? Otherwise, it's going to be difficult to make progress on this.

tjgq avatar Jul 23 '24 09:07 tjgq

Hi. We're also seeing this issue although it's very intermittent with only 6 builds out of 46,046 in the last week impacted. We're using a simple HTTP cache (nginx caching proxy in front of Artifactory) and are quite confident it's not a cache issue.

I notice in your first message you jumped to --remote_download_all. Our builds are a mixture of --remote_download_all and --remote_download_toplevel and while we aren't seeing this often it does appear it's always with --remote_download_all builds.

Please let us know what additional information/logs would be helpful.

miscott2 avatar Jul 30 '24 09:07 miscott2

@miscott2 I think the --experimental_remote_grpc_log for one of the failed runs would be the most useful piece of information here. (You can scrub any sensitive information, but please preserve the digests.)

tjgq avatar Jul 30 '24 09:07 tjgq

Oh wait, but if you're using an HTTP cache, there's no gRPC log; nevermind.

tjgq avatar Jul 30 '24 09:07 tjgq

@tjgq Since, as miscott2 says, we're using HTTP rather than GRPC, is there some other information that would be useful in that case? In one of your previous posts you asked if this could be reproduced using an HTTP cache, to which the answer very much appears to be "yes", so is there a way to gather useful info in that case?

NeilKetley avatar Sep 27 '24 12:09 NeilKetley

@NeilKetley What's the eviction policy for your HTTP cache? i.e., do you have any automated process that periodically removes old entries from the cache? Would you be able to confirm whether the blob in question was present in the cache at some point, but later got deleted? I'm wondering whether this might be just a special case of #18696.

tjgq avatar Sep 27 '24 12:09 tjgq

I'm going with the theory that this is the same as #18696, which has been fixed in 7.4.0. Please reopen if you're seeing similar failures in 7.4.0 or later.

tjgq avatar Oct 24 '24 14:10 tjgq

@tjgq apologies for not responding sooner. We are still attempting to repro and collect the information / answer the questions you posed previously. I do not think we will be able to try a later version of Bazel at this point since this issue is happening in our live build system, not really open for experimentation, but we will collect the info requested and hope that this will either confirm your suspicion or show otherwise.

NeilKetley avatar Oct 25 '24 09:10 NeilKetley