Bazel Causing Server to Become Unresponsive
Description of the bug:
During a Bazel build, my server completely lost responsiveness after running the build for some time. I tried to limit resource usage by specifying the following options:
--jobs=8 --local_cpu_resources=HOST_CPUS*.5 --local_ram_resources=HOST_RAM*.5
However, this had no effect—the server still froze. My server has 127 CPU cores, and during the build, it shows "127 actions, 127 running." Strangely, even after specifying the above parameters, it still shows 127 actions running. How can I properly limit Bazel’s resource usage to prevent the server from crashing? I can’t provide more details on system resource usage because the server freezes completely during the build.
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
No response
Which operating system are you running Bazel on?
openeuler for riscv64
What is the output of bazel info release?
release 6.5.0
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse HEAD ?
No response
If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
No response
Have you found anything relevant by searching the web?
https://github.com/bazelbuild/bazel/issues/11868
Any other information, logs, or outputs that you want to share?
No response
I found the cause of the server crash: Bazel was using all the memory in the system without limits, which eventually led to resource exhaustion and the server crashing. How can I fix this issue? I set --local_ram_resources=HOST_RAM*.5, but it didn't help.
Maybe I should use the parameter --host_jvm_args=-Xmx64g (the system has 121GB of memory).
Please help me.
Can you share a JSON trace profile? Also, can you use --announce_rc to see whether there are any other flags that could be relevant?
Are you using the embedded JDK or do you specify one yourself?
I'm sorry for the late reply. I didn't specify any JDK for Bazel. A particular point is that my server runs on a RISC-V architecture CPU, and my colleague built Bazel 6.5.0 for that architecture. The issue might stem from the fact that our Bazel is a non-official release version, which led to this error. Later, I limited --jobs=4 during the bazel build, which prevented the memory usage from exceeding my local memory.
After adding the --announce_rc parameter, the output of executing bazel build is as follows:
[zjq@openeuler-riscv-4-4 proxy]$ make build
Starting local Bazel server and connecting to it...
export PATH=/usr/lib/llvm-10/bin:/home/zjq/riscv_istio_test/go_golang/go/bin:/home/zjq/riscv_istio_test/go_golang/golang/bin:/home/zjq/.cargo/bin:/home/zjq/.wasmtime/bin:/home/zjq/local/bazel/bin:/home/zjq/.local/bin:/home/zjq/bin:/home/zjq/.cabal/bin:/home/zjq/build_factory/GHC/cabal/cabal2/cabal/_build/bin:/usr/local/bin:/usr/bin CC=gcc CXX=g++ && \
bazel build --announce_rc --fission=no --local_cpu_resources=50 --local_ram_resources=32768 --jobs=4 //...
INFO: Reading 'startup' options from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc: --host_jvm_args=-Xmx3g
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=149
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc:
Inherited 'common' options: --experimental_allow_tags_propagation
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc:
'build' options: --color=yes --workspace_status_command=bash bazel/get_workspace_status --incompatible_strict_action_env --java_runtime_version=remotejdk_11 --tool_java_runtime_version=remotejdk_11 --platform_mappings=bazel/platform_mappings --copt=-DABSL_MIN_LOG_LEVEL=4 --define envoy_mobile_listener=enabled --experimental_repository_downloader_retries=2 --action_env=CC --host_action_env=CC --action_env=CXX --host_action_env=CXX --action_env=LLVM_CONFIG --host_action_env=LLVM_CONFIG --action_env=PATH --host_action_env=PATH --action_env=BAZEL_VOLATILE_DIRTY --host_action_env=BAZEL_VOLATILE_DIRTY --action_env=BAZEL_FAKE_SCM_REVISION --host_action_env=BAZEL_FAKE_SCM_REVISION --enable_platform_specific_config --test_summary=terse --incompatible_config_setting_private_default_visibility --incompatible_enforce_config_setting_visibility --define absl=1 --@com_googlesource_googleurl//build_config:system_icu=0 --test_env=HEAPCHECK=normal --test_env=PPROF_PATH
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/.bazelrc:
'build' options: --workspace_status_command=bazel/bazel_get_workspace_status --define path_normalization_by_default=true --define tcmalloc=gperftools --define wasm=v8 --copt -DNULL_PLUGIN --cxxopt -Wformat --cxxopt -Wformat-security --host_linkopt=-pthread --action_env=CXXFLAGS=-Wno-unused-variable
INFO: Found applicable config definition build:linux in file /home/zjq/build_factory/proxy/proxy/envoy.bazelrc: --copt=-fPIC --copt=-Wno-deprecated-declarations --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --conlyopt=-fexceptions --fission=dbg,opt --features=per_object_debug_info --action_env=BAZEL_LINKLIBS=-l%:libstdc++.a --action_env=BAZEL_LINKOPTS=-lm --per_file_copt=external/com_github_datadog_dd_opentracing_cpp/.*.cpp@-Wno-type-limits
Can you share a JSON trace profile?