bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Bazel Causing Server to Become Unresponsive

Open Boring545 opened this issue 1 year ago • 1 comments

Description of the bug:

During a Bazel build, my server completely lost responsiveness after running the build for some time. I tried to limit resource usage by specifying the following options:

--jobs=8 --local_cpu_resources=HOST_CPUS*.5 --local_ram_resources=HOST_RAM*.5

However, this had no effect—the server still froze. My server has 127 CPU cores, and during the build, it shows "127 actions, 127 running." Strangely, even after specifying the above parameters, it still shows 127 actions running. How can I properly limit Bazel’s resource usage to prevent the server from crashing? I can’t provide more details on system resource usage because the server freezes completely during the build.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

openeuler for riscv64

What is the output of bazel info release?

release 6.5.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

https://github.com/bazelbuild/bazel/issues/11868

Any other information, logs, or outputs that you want to share?

No response

Boring545 avatar Oct 10 '24 06:10 Boring545

I found the cause of the server crash: Bazel was using all the memory in the system without limits, which eventually led to resource exhaustion and the server crashing. How can I fix this issue? I set --local_ram_resources=HOST_RAM*.5, but it didn't help. Maybe I should use the parameter --host_jvm_args=-Xmx64g (the system has 121GB of memory). Please help me.

Boring545 avatar Oct 11 '24 09:10 Boring545

Can you share a JSON trace profile? Also, can you use --announce_rc to see whether there are any other flags that could be relevant?

meisterT avatar Oct 22 '24 09:10 meisterT

Are you using the embedded JDK or do you specify one yourself?

meisterT avatar Oct 22 '24 09:10 meisterT

I'm sorry for the late reply. I didn't specify any JDK for Bazel. A particular point is that my server runs on a RISC-V architecture CPU, and my colleague built Bazel 6.5.0 for that architecture. The issue might stem from the fact that our Bazel is a non-official release version, which led to this error. Later, I limited --jobs=4 during the bazel build, which prevented the memory usage from exceeding my local memory.

Boring545 avatar Oct 25 '24 10:10 Boring545

After adding the --announce_rc parameter, the output of executing bazel build is as follows:

[zjq@openeuler-riscv-4-4 proxy]$ make build
Starting local Bazel server and connecting to it...
export PATH=/usr/lib/llvm-10/bin:/home/zjq/riscv_istio_test/go_golang/go/bin:/home/zjq/riscv_istio_test/go_golang/golang/bin:/home/zjq/.cargo/bin:/home/zjq/.wasmtime/bin:/home/zjq/local/bazel/bin:/home/zjq/.local/bin:/home/zjq/bin:/home/zjq/.cabal/bin:/home/zjq/build_factory/GHC/cabal/cabal2/cabal/_build/bin:/usr/local/bin:/usr/bin CC=gcc CXX=g++ && \
bazel  build  --announce_rc --fission=no --local_cpu_resources=50 --local_ram_resources=32768 --jobs=4   //...
INFO: Reading 'startup' options from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc: --host_jvm_args=-Xmx3g
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=149
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc:
  Inherited 'common' options: --experimental_allow_tags_propagation
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc:
  'build' options: --color=yes --workspace_status_command=bash bazel/get_workspace_status --incompatible_strict_action_env --java_runtime_version=remotejdk_11 --tool_java_runtime_version=remotejdk_11 --platform_mappings=bazel/platform_mappings --copt=-DABSL_MIN_LOG_LEVEL=4 --define envoy_mobile_listener=enabled --experimental_repository_downloader_retries=2 --action_env=CC --host_action_env=CC --action_env=CXX --host_action_env=CXX --action_env=LLVM_CONFIG --host_action_env=LLVM_CONFIG --action_env=PATH --host_action_env=PATH --action_env=BAZEL_VOLATILE_DIRTY --host_action_env=BAZEL_VOLATILE_DIRTY --action_env=BAZEL_FAKE_SCM_REVISION --host_action_env=BAZEL_FAKE_SCM_REVISION --enable_platform_specific_config --test_summary=terse --incompatible_config_setting_private_default_visibility --incompatible_enforce_config_setting_visibility --define absl=1 --@com_googlesource_googleurl//build_config:system_icu=0 --test_env=HEAPCHECK=normal --test_env=PPROF_PATH
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/.bazelrc:
  'build' options: --workspace_status_command=bazel/bazel_get_workspace_status --define path_normalization_by_default=true --define tcmalloc=gperftools --define wasm=v8 --copt -DNULL_PLUGIN --cxxopt -Wformat --cxxopt -Wformat-security --host_linkopt=-pthread --action_env=CXXFLAGS=-Wno-unused-variable
INFO: Found applicable config definition build:linux in file /home/zjq/build_factory/proxy/proxy/envoy.bazelrc: --copt=-fPIC --copt=-Wno-deprecated-declarations --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --conlyopt=-fexceptions --fission=dbg,opt --features=per_object_debug_info --action_env=BAZEL_LINKLIBS=-l%:libstdc++.a --action_env=BAZEL_LINKOPTS=-lm --per_file_copt=external/com_github_datadog_dd_opentracing_cpp/.*.cpp@-Wno-type-limits

Boring545 avatar Oct 25 '24 10:10 Boring545

Can you share a JSON trace profile?

meisterT avatar Nov 11 '24 12:11 meisterT