bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Maintain multiple analyses in analysis cache

Open irfansharif opened this issue 4 years ago • 19 comments

Description of the problem / feature request:

Bazel seems to maintain only one analysis in the analysis cache. Whenever a relevant flag is changed, we purge the cache in its entirety. This adds to build time even with a 100% cache hit rate for the build artifacts (see repro below). Workarounds I've seen so far suggest using a different output_base for each set of flags, which is pretty cumbersome and uses a ton of cache space. Envoy's gotten around by sharing the same set of flags for both build and test, but that's not always possible and is a stop gap for bazel preserving earlier analyses in the analysis cache for possible future re-use.

Feature requests: what underlying problem are you trying to solve with this feature?

It's a common workflow to switch between building and testing a bazel project. For us that entails using different --define and --test_env flags, which discards the analysis cache entirely. If bazel's maintaining only one copy of the analysis cache (from the last bazel invocation), we end up doing a lot of extra work to re-analyze the build despite 100% of the build artifacts being present in the remote cache. Maintaining multiple analyses in the analysis cache, each tagged with whatever compiler options they're safe to use with, would help us cut the iteration time down substantially. This also holds true for CI, where even if all the necessary artifacts are present, thrashing the analysis cache results in us doing a lot of unnecessary work.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

With https://github.com/cockroachdb/cockroach:

$ bazel build //pkg/cmd/cockroach-short:cockroach-short
INFO: Build options --define and --test_env have changed, discarding analysis cache.
INFO: Analyzed target //pkg/cmd/cockroach-short:cockroach-short (0 packages loaded, 18793 targets configured).
INFO: Found 1 target...
Target //pkg/cmd/cockroach-short:cockroach-short up-to-date:
  _bazel/bin/pkg/cmd/cockroach-short/cockroach-short_/cockroach-short
INFO: Elapsed time: 25.610s, Critical Path: 20.89s
INFO: 1995 processes: 1994 disk cache hit, 1 internal.
INFO: Build completed successfully, 1995 total actions
Successfully built binary for target //pkg/cmd/cockroach-short:cockroach-short at cockroach-short

$ bazel test //pkg/spanconfig/spanconfigkvsubscriber:spanconfigkvsubscriber_test --test_filter=TestDataDriven/basic --test_output errors
INFO: Build options --define and --test_env have changed, discarding analysis cache.
INFO: Analyzed target //pkg/spanconfig/spanconfigkvsubscriber:spanconfigkvsubscriber_test (0 packages loaded, 17487 targets configured).
INFO: Found 1 test target...
Target //pkg/spanconfig/spanconfigkvsubscriber:spanconfigkvsubscriber_test up-to-date:
  _bazel/bin/pkg/spanconfig/spanconfigkvsubscriber/spanconfigkvsubscriber_test_/spanconfigkvsubscriber_test
INFO: Elapsed time: 26.116s, Critical Path: 24.73s
INFO: 1995 processes: 1994 disk cache hit, 1 internal.
INFO: Build completed successfully, 1995 total actions
//pkg/spanconfig/spanconfigkvsubscriber:spanconfigkvsubscriber_test (cached) PASSED in 1.8s

Executed 0 out of 1 test: 1 test passes.
INFO: Build completed successfully, 1995 total actions

$ bazel build //pkg/cmd/cockroach-short:cockroach-short
WARNING: Ignoring JAVA_HOME, because it must point to a JDK, not a JRE.
INFO: Invocation ID: 7822bf1c-b690-4138-bdf9-45a4b1eed074
INFO: Build options --define and --test_env have changed, discarding analysis cache.
INFO: Analyzed target //pkg/cmd/cockroach-short:cockroach-short (0 packages loaded, 18793 targets configured).
INFO: Found 1 target...
Target //pkg/cmd/cockroach-short:cockroach-short up-to-date:
  _bazel/bin/pkg/cmd/cockroach-short/cockroach-short_/cockroach-short
INFO: Elapsed time: 25.610s, Critical Path: 20.89s
INFO: 1995 processes: 1994 disk cache hit, 1 internal.
INFO: Build completed successfully, 1995 total actions
Successfully built binary for target //pkg/cmd/cockroach-short:cockroach-short at cockroach-short

Observe that when switching between bazel build and bazel test, despite have a 100% disk cache hit, it takes ~25s for the execution to complete as a result of not having an earlier run's analysis cache data to consult. Compare it to when it is available:

$ bazel build //pkg/cmd/cockroach-short:cockroach-short --profile=build.gz
INFO: Analyzed target //pkg/cmd/cockroach-short:cockroach-short (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //pkg/cmd/cockroach-short:cockroach-short up-to-date:
  _bazel/bin/pkg/cmd/cockroach-short/cockroach-short_/cockroach-short
INFO: Elapsed time: 1.259s, Critical Path: 0.83s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
Successfully built binary for target //pkg/cmd/cockroach-short:cockroach-short at cockroach-short

What operating system are you running Bazel on?

MacOS Big Sur.

What's the output of bazel info release?

release 6.0.0-pre.20211019.1, though the same problem exists with 4.0.0 onwards.

Have you found anything relevant by searching the web?

Some other threads on the mailing list: https://groups.google.com/g/bazel-discuss/c/EdBvMEPrH5A and https://groups.google.com/g/bazel-discuss/c/vgn9uyyIrIM/m/4trNkRS0AQAJ. Github: #11194, https://github.com/bazelbuild/bazel/issues/12113#issuecomment-952602517. Stack Overflow: https://stackoverflow.com/questions/53012722/why-does-bazel-do-a-full-rebuild-whenever-switching-between-intellij-and-command

irfansharif avatar Oct 27 '21 15:10 irfansharif

@irfansharif Unfortunately, I don't have any code left from my experiments of not deleting the analysis cache. I threw it away as it didn't work.

moroten avatar Oct 27 '21 22:10 moroten

I want to signal boost https://groups.google.com/g/bazel-discuss/c/vgn9uyyIrIM/m/4trNkRS0AQAJ - @moroten 's discussion listed above

The biggest challenge is addressing correctness concerns.

Partial approaches could be viable, with trim_test_configuration as precedent.

gregestren avatar Nov 11 '21 15:11 gregestren

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

github-actions[bot] avatar May 25 '23 01:05 github-actions[bot]

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

github-actions[bot] avatar Jun 08 '23 01:06 github-actions[bot]

FYI, a decent workaround is to use the --output_base flag. It's not ideal, but if you're switching between configs, you can also switch between output bases and have a whole seperate cache.

matts1 avatar Jun 08 '23 01:06 matts1

To clarify @matts1 workaround: changing --output_base starts a new instance of Baze. So you have one Bazel server for each analysis configuration you wanted to keep.

pauldraper avatar Oct 24 '24 14:10 pauldraper

That's correct, yeah

matts1 avatar Oct 24 '24 18:10 matts1

We can see two improvements to this:

  • Depending on which flags you change, we think we can easily avoid redoing the exec-configured part of the cache, which can be half the cache for many builds. --define would probably fit this. I think --test_env too although Blaze has some special injection logic for that. @katre and @susinmotion are looking at that.

  • @jin is working on cross-build analysis caching that would make swapping in/out the analysis cache much faster, even if invalidation still occurs.

gregestren avatar Nov 05 '24 21:11 gregestren

Could we reopen this issue, since it remains relevant and is being actively discussed?

tpudlik avatar Dec 02 '24 19:12 tpudlik

@jin is working on cross-build analysis caching that would make swapping in/out the analysis cache much faster, even if invalidation still occurs.

Yes: https://www.youtube.com/watch?v=op4gIYxucjE is a BazelCon 2024 about the cross-build analysis caching.

We are still working on the internal reference implementation of this, and will prioritize Bazel support after that.

jin avatar Dec 04 '24 05:12 jin

Since this issue is specifically about --define and --test_env, two more things:

  • @fmeum did some work to optimize --test_env: https://github.com/bazelbuild/bazel/issues/7450
  • @fmeum 's considering optimizations for Starlark flags as a bigger project: https://github.com/bazelbuild/bazel/issues/13591#issuecomment-2521428998 If you convert your --define to a Starlark flag that would kick in.

gregestren avatar Dec 05 '24 21:12 gregestren

Just to be clear, who still cares about this beside @pauldraper , @tpudlik , and @matts1 ? These comments span over a few years.

gregestren avatar Dec 05 '24 21:12 gregestren

I think in general anyone who uses persistent CI runners and tests multiple configurations cares about reducing this overhead as much as possible. Having N output bases is annoying at the very least

keith avatar Mar 20 '25 18:03 keith

@gregestren Could you give an update on Skycache?

fmeum avatar Mar 20 '25 18:03 fmeum

Depending on which flags you change, we think we can easily avoid redoing the exec-configured part of the cache, which can be half the cache for many builds. --define would probably fit this. I think --test_env too although Blaze has some special injection logic for that. @katre and @susinmotion are looking at that.

This is enabled and shows great reduction in re-analysis time for Google's repo. I'm curious what it shows in other environments. That said, this is only an incremental improvement.

@gregestren Could you give an update on Skycache?

@greatfilter can clarify.

gregestren avatar Mar 20 '25 19:03 gregestren

This is enabled and shows great reduction in re-analysis time for Google's repo. I'm curious what it shows in other environments. That said, this is only an incremental improvement.

I just tested on our project with this flag and i didn't see a meaningful difference. I imagine that's just a reflection of the number of files we have vs google

keith avatar Mar 20 '25 19:03 keith

Thanks for testing, @keith . Which flags did you toggle?

I forgot there's another important difference. That change only applies to flags that don't propagate from the target to exec config. Google's repo activated these changes so --define and Starlark flags don't propagate, which by itself made a nice resource difference.

We haven't defaulted that for Bazel because we need an escape hatch for flags that do want to propagate. @aranguyen 's close to adding Starlark flag support for that so we can opt out Starlark flags.

gregestren avatar Mar 20 '25 22:03 gregestren

we are also using experimental_exclude_starlark_flags_from_exec_config for other reasons, and done use defines.

In our case it is a starlark flag that controls behavior similar to --compilation_mode, but it might also contain --compilation_mode, --strip --per_file_copt, --action_env, --platforms, etc

keith avatar Mar 20 '25 22:03 keith

Am also interested in this. Since we have multiple platforms on our monorepo, aligning the flags perfectly is not easily feasible and having multiple output bases works but is something we would like to avoid if Bazel could handle multiple caches

rockbruno avatar Jun 11 '25 09:06 rockbruno