rules_go icon indicating copy to clipboard operation
rules_go copied to clipboard

Golink wait for too much time with bazel 7 and remote execution

Open hawkingrei opened this issue 1 year ago • 3 comments

What version of rules_go are you using?

v0.50.1

What version of gazelle are you using?

v0.38.0

What version of Bazel are you using?

[root@10-2-12-124 tidb]# bazelisk version Bazelisk version: v1.17.0 Starting local Bazel server and connecting to it... Build label: 7.3.1 Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer Build time: Mon Aug 19 16:12:50 2024 (1724083970) Build timestamp: 1724083970 Build timestamp as int: 1724083970

Does this issue reproduce with the latest releases of all the above?

bazel build //...

with the remote execution. bazel-buildfarm v2.11.1

build --announce_rc
build --experimental_guard_against_concurrent_changes
build --experimental_remote_merkle_tree_cache
build --disk_cache=/data1/bazel/cache
build --experimental_remote_cache_compression
build --repository_cache=/data1/bazel/cache
run --color=yes
build  --remote_executor=grpc://xxxx:8980
build  --jobs 100 

What operating system and processor architecture are you using?

Any other potentially useful information about your toolchain?

What did you do?

What did you expect to see?

can finish the work with bazel v6.5

https://tiprow.hawkingrei.com/view/gs/pingcapprow/pr-logs/pull/pingcap_tidb/51126/fast_test_tiprow/1838293257872740352#1:build-log.txt%3A11

What did you see instead?

I wait for a long time. it cannot stop and not raise the error.

Starting local Bazel server and connecting to it...
INFO: Invocation ID: aa345b60-b676-4ecb-bca1-12c92e87a140
INFO: Reading 'startup' options from /data3/wangweizhen/tidb/.bazelrc: --host_jvm_args=-Xmx4g, --unlimit_coredumps
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=148
INFO: Reading rc options for 'build' from /data3/wangweizhen/tidb/.bazelrc:
  'build' options: --announce_rc --experimental_guard_against_concurrent_changes --experimental_remote_merkle_tree_cache --java_language_version=17 --java_runtime_version=17 --tool_java_language_version=17 --tool_java_runtime_version=17 --incompatible_strict_action_env --incompatible_enable_cc_toolchain_resolution
INFO: Reading rc options for 'build' from /root/.bazelrc:
  'build' options: --announce_rc --experimental_guard_against_concurrent_changes --experimental_remote_merkle_tree_cache --disk_cache=/data1/bazel/cache --experimental_remote_cache_compression --repository_cache=/data1/bazel/cache --remote_executor=grpc://10.2.12.124:8980 --jobs 100
WARNING: Found stale downloads from previous build, deleting...
INFO: Analyzed 1202 targets (2310 packages loaded, 29891 targets configured).
[23,322 / 24,720] 100 actions, 0 running
    [Prepa] GoLink br/pkg/checkpoint/checkpoint_test_/checkpoint_test; 213s
    [Prepa] GoLink pkg/lightning/backend/kv/kv_test_/kv_test; 213s
    [Prepa] GoLink pkg/sessionctx/stmtctx/stmtctx_test_/stmtctx_test; 212s
    [Prepa] GoLink pkg/sessionctx/sessionstates/sessionstates_test_/sessionstates_test; 212s
    [Prepa] GoLink pkg/meta/meta_test_/meta_test; 212s
    [Prepa] GoLink pkg/statistics/statistics_test_/statistics_test; 212s
    [Prepa] GoLink pkg/extension/extension_test_/extension_test; 212s
    [Prepa] GoLink pkg/table/tables/tables_test_/tables_test; 212s ...

hawkingrei avatar Sep 23 '24 19:09 hawkingrei

You can reproduce this problem.

git clone https://github.com/pingcap/tidb.git
cd tidb
bazel build //... --//build:with_nogo_flag=true --//build:with_rbe_flag=true

hawkingrei avatar Sep 23 '24 19:09 hawkingrei

If this fails with Bazel 7 but succeeds with Bazel 6, I would expect this to be due to a Bazel bug or incompatible change. Could you share a Starlark profile so that we can get some idea of what's taking so long?

Edit: --experimental_remote_merkle_tree_cache is known to be quite problematic in certain cases. Could you try disabling it?

fmeum avatar Sep 25 '24 11:09 fmeum

https://github.com/bazelbuild/bazel/issues/21626#issuecomment-2021876621

We hit the same issue, either increase the cache size with experimental_remote_merkle_tree_cache_size or disable merkle tree cache

arjantop-cai avatar Oct 16 '24 22:10 arjantop-cai