Golink wait for too much time with bazel 7 and remote execution
What version of rules_go are you using?
v0.50.1
What version of gazelle are you using?
v0.38.0
What version of Bazel are you using?
[root@10-2-12-124 tidb]# bazelisk version Bazelisk version: v1.17.0 Starting local Bazel server and connecting to it... Build label: 7.3.1 Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer Build time: Mon Aug 19 16:12:50 2024 (1724083970) Build timestamp: 1724083970 Build timestamp as int: 1724083970
Does this issue reproduce with the latest releases of all the above?
bazel build //...
with the remote execution. bazel-buildfarm v2.11.1
build --announce_rc
build --experimental_guard_against_concurrent_changes
build --experimental_remote_merkle_tree_cache
build --disk_cache=/data1/bazel/cache
build --experimental_remote_cache_compression
build --repository_cache=/data1/bazel/cache
run --color=yes
build --remote_executor=grpc://xxxx:8980
build --jobs 100
What operating system and processor architecture are you using?
Any other potentially useful information about your toolchain?
What did you do?
What did you expect to see?
can finish the work with bazel v6.5
https://tiprow.hawkingrei.com/view/gs/pingcapprow/pr-logs/pull/pingcap_tidb/51126/fast_test_tiprow/1838293257872740352#1:build-log.txt%3A11
What did you see instead?
I wait for a long time. it cannot stop and not raise the error.
Starting local Bazel server and connecting to it...
INFO: Invocation ID: aa345b60-b676-4ecb-bca1-12c92e87a140
INFO: Reading 'startup' options from /data3/wangweizhen/tidb/.bazelrc: --host_jvm_args=-Xmx4g, --unlimit_coredumps
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=148
INFO: Reading rc options for 'build' from /data3/wangweizhen/tidb/.bazelrc:
'build' options: --announce_rc --experimental_guard_against_concurrent_changes --experimental_remote_merkle_tree_cache --java_language_version=17 --java_runtime_version=17 --tool_java_language_version=17 --tool_java_runtime_version=17 --incompatible_strict_action_env --incompatible_enable_cc_toolchain_resolution
INFO: Reading rc options for 'build' from /root/.bazelrc:
'build' options: --announce_rc --experimental_guard_against_concurrent_changes --experimental_remote_merkle_tree_cache --disk_cache=/data1/bazel/cache --experimental_remote_cache_compression --repository_cache=/data1/bazel/cache --remote_executor=grpc://10.2.12.124:8980 --jobs 100
WARNING: Found stale downloads from previous build, deleting...
INFO: Analyzed 1202 targets (2310 packages loaded, 29891 targets configured).
[23,322 / 24,720] 100 actions, 0 running
[Prepa] GoLink br/pkg/checkpoint/checkpoint_test_/checkpoint_test; 213s
[Prepa] GoLink pkg/lightning/backend/kv/kv_test_/kv_test; 213s
[Prepa] GoLink pkg/sessionctx/stmtctx/stmtctx_test_/stmtctx_test; 212s
[Prepa] GoLink pkg/sessionctx/sessionstates/sessionstates_test_/sessionstates_test; 212s
[Prepa] GoLink pkg/meta/meta_test_/meta_test; 212s
[Prepa] GoLink pkg/statistics/statistics_test_/statistics_test; 212s
[Prepa] GoLink pkg/extension/extension_test_/extension_test; 212s
[Prepa] GoLink pkg/table/tables/tables_test_/tables_test; 212s ...
You can reproduce this problem.
git clone https://github.com/pingcap/tidb.git
cd tidb
bazel build //... --//build:with_nogo_flag=true --//build:with_rbe_flag=true
If this fails with Bazel 7 but succeeds with Bazel 6, I would expect this to be due to a Bazel bug or incompatible change. Could you share a Starlark profile so that we can get some idea of what's taking so long?
Edit: --experimental_remote_merkle_tree_cache is known to be quite problematic in certain cases. Could you try disabling it?
https://github.com/bazelbuild/bazel/issues/21626#issuecomment-2021876621
We hit the same issue, either increase the cache size with experimental_remote_merkle_tree_cache_size or disable merkle tree cache