bazel
bazel copied to clipboard
Asynchronous sandbox cleanup leads to inode exhaustion with defaulted --experimental_sandbox_async_tree_delete_idle_threads=4
Description of the bug:
It seems that Bazel does not clean up the sandbox directories fast enough, which eventually leads to running out of inodes when performing a build.
Which category does this issue belong to?
Local Execution
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I'm not sure what the simplest way to reproduce this bug is. It might be because I am running a 96 core machine with
build --jobs=92 --local_cpu_resources=92 --sandbox_base=/dev/shm
and lots of symlinks from c++ compilation with tmpfs. The documentation hints that deletion threads are limited to 1 during the build which is probably the problem.
Which operating system are you running Bazel on?
Linux 6.7.0-arch3-1 #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 x86_64 GNU/Linux
What is the output of bazel info release
?
release 7.0.1
If bazel info release
returns development version
or (@non-git)
, tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD
?
No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
Probably d169556e1630c1ea2a670b8ecf405141356a504e which changed the default of experimental_sandbox_async_tree_delete_idle_threads to 4. I did not actually run the bisection, but I will note that 7.0.0-pre.20231011.2 did not have this problem and 7.0.1 does have the issue.
Have you found anything relevant by searching the web?
https://github.com/bazelbuild/bazel/issues/19266#issuecomment-1682828971
Any other information, logs, or outputs that you want to share?
It fails with an error message such as:
ERROR: /home/jeremiah/projects/REDACTED/BUILD:3:18: Compiling REDACTED.cpp failed: I/O exception during sandboxed execution: /dev/shm/bazel-sandbox.215dc078d692e667b99e0907ffd4a9aa9a5f6df28dfed5a7e2f2dcdc3fa54f19/linux-sandbox/8034/execroot/_main/external/boost/boost/metaparse/v1/error (No space left on device)
Inodes after failure:
Filesystem Inodes IUsed IFree IUse% Mounted on tmpfs 95M 95M 1.7K 100% /dev/shm
Inodes after manually removing sandbox directory:
Filesystem Inodes IUsed IFree IUse% Mounted on tmpfs 95M 543 95M 1% /dev/shm
I am using --sandbox_base=/dev/shm which is mounted as tmpfs.
Filesystem Size Used Avail Use% Mounted on tmpfs 378G 746M 377G 1% /dev/shm
After the build fails, the sandbox directory is slowly cleaned up, unless bazel is shutdown, which is what originally alerted me to the cause.
It seems that this was probably caused by d169556e1630c1ea2a670b8ecf405141356a504e which enabled asynchronous sandbox cleanup by changing the default of experimental_sandbox_async_tree_delete_idle_threads to 4.
I confirmed that adding experimental_sandbox_async_tree_delete_idle_threads=0 fixes the issue for me.
Cc @oquenchil
We experienced the same thing with JS tests with big runfiles trees and a /dev/shm sandbox base
Experiencing same issue
@bazel-io flag
@sgowroji This sounds more like an issue for Team-Local-Exec as it isn't language-specific, but related to sandboxing in general. I mentioned @oquenchil since he was involved in the flag "flip".
@bazel-io fork 7.1.0
@iancha1992 This may need to be picked into a 7.0.2 depending on assessed severity.
How many inodes are we talking about? I can't tell from the error which number is the inodes.
Depending on how widespread this is we have two options, if few users are affected they can pass -- experimental_sandbox_async_tree_delete_idle_threads=0
. If it's more widespread we should un-flip the flag and make the default 0.
The long term solution would be to keep track of how many inodes are free and block before planting any more symlinks. I won't rush that solution for 7.0.2 though.
The number of inodes on /dev/shm is 95M. Before removing the sandbox directory, there were 1700 free (it's not 0 because Bazel cleans some up before I check the number.) After manually removing the sandbox directory, there are 95M free.
It takes around 65 seconds to run out of inodes when the build is running. Once the build fails, if I just leave Bazel alone to clean up the sandbox itself, it takes around 554 seconds to clean up the sandbox with --experimental_sandbox_async_tree_delete_idle_threads
not mentioned in bazelrc (so defaulted to 4 at version 7.0.1).
In general it seems off to me that we generate inodes based on jobs (typically CPU count) but don't scale cleanup based on CPUs. So the more powerful your machine, the easier it is to have creation outpace deletion, right?
On Tue, Jan 23, 2024, 6:44 PM Jeremiah Rodriguez @.***> wrote:
It takes around 65 seconds to run out of inodes when the build is running. Once the build fails, if I just leave Bazel alone to clean up the sandbox itself, it takes around 554 seconds to clean up the sandbox with --experimental_sandbox_async_tree_delete_idle_threads not mentioned in bazelrc (so defaulted to 4 at version 7.0.1).
— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/20965#issuecomment-1907099709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL6JETCT4UMCCSCTVVLSFDYQBDM5AVCNFSM6AAAAABCDPDUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGA4TSNZQHE . You are receiving this because you commented.Message ID: @.***>
In general it seems off to me that we generate inodes based on jobs (typically CPU count) but don't scale cleanup based on CPUs. So the more powerful your machine, the easier it is to have creation outpace deletion, right? … On Tue, Jan 23, 2024, 6:44 PM Jeremiah Rodriguez @.> wrote: It takes around 65 seconds to run out of inodes when the build is running. Once the build fails, if I just leave Bazel alone to clean up the sandbox itself, it takes around 554 seconds to clean up the sandbox with --experimental_sandbox_async_tree_delete_idle_threads not mentioned in bazelrc (so defaulted to 4 at version 7.0.1). — Reply to this email directly, view it on GitHub <#20965 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL6JETCT4UMCCSCTVVLSFDYQBDM5AVCNFSM6AAAAABCDPDUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGA4TSNZQHE . You are receiving this because you commented.Message ID: @.>
Agreed. With tmpfs, there would be a cpu bottleneck from in memory operations. So the rate of symlink creation during the build will scale up to the cpu and kernel limits, but the rate of symlink deletion during the build is capped at 1 core (according to the documentation).
The docs state:
[--experimental_sandbox_async_tree_delete_idle_threads] If 0, delete sandbox trees as soon as an action completes (causing completion of the action to be delayed). If greater than zero, execute the deletion of such threes on an asynchronous thread pool that has size 1 when the build is running and grows to the size specified by this flag when the server is idle.
The reason you give sounds logical. A single deletion thread cannot keep up.
Something else I was wondering, do you have --reuse_sandbox_directories
enabled? That should in theory reduce the number of deletions necessary. It was also enabled by default for 7.0 so I guess you do unless you disabled it explicitly.
I still don't have a sense of how much of an edge case your set up is in order to evaluate whether the flag should be unflipped immediately or have users affected turn it off by themselves. I can start working from today on a proper fix but I don't want this to be blocking the release.
Would this affect most JS users?
@oquenchil The 7.0.2 release has already happened, so there is no risk of blocking it. :-)
The reason you give sounds logical. A single deletion thread cannot keep up.
Something else I was wondering, do you have
--reuse_sandbox_directories
enabled? That should in theory reduce the number of deletions necessary. It was also enabled by default for 7.0 so I guess you do unless you disabled it explicitly.I still don't have a sense of how much of an edge case your set up is in order to evaluate whether the flag should be unflipped immediately or have users affected turn it off by themselves. I can start working from today on a proper fix but I don't want this to be blocking the release.
Would this affect most JS users?
I do not have --reuse_sandbox_directories
disabled. Perhaps the others can add some additional information to give a sense of the scope of the problem.
We also have the reuse flag enabled. (As an aside, I am a little confused by the sandbox reuse. It works per-mnemonic but all tests share one so it seems easy to ping-pong a sandbox between different runfiles trees, potentially with little overlap. I'd imagine it works better for build actions where you have different mnemonics and a large toolchain gets consistently reused for every action with given mnemonic.)
between different runfiles trees,
Coincidentally I realized this yesterday evening and I'm implementing a fix now.
There is another issue in the current code. The second parameter here should be the sandbox base. Otherwise in cases like yours where you pass a different filesystem to the sandbox base, the stashing becomes copying.
Once I get these two changes in, the execution should be much faster apart from fixing the inode exhaustion problem.
Any update? Should this still be a release blocker for 7.1?
Almost done. Is this the last blocker?
Nope, there are still a number of them, I'm just doing a round of review.
A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!
I tested a full rebuild of my project with Bazel 7.1.0 RC1. The result is that the build is actually able to complete now without specifying --experimental_sandbox_async_tree_delete_idle_threads=0
. The inode usage gets up to 51%, but at least they are not fully exhausted now before the build completes. For comparison, when disabling async tree deletion, the inode usage only gets up to around 12%. So the situation has improved a lot, at least in my use case.
Glad to hear Jeremiah. Could you please comment on whether the build/tests got any faster? With the reuse changes it should have got much faster so it'd be disappointing to hear it didn't.
You can build/test with --profile=/tmp/profile.gz
.
Then get the time with:
zcat /tmp/profile.gz | jq '.traceEvents | .[] | select(.name == "sandbox.createFileSystem") | .dur' | paste -sd+ - | bc
At least there I'd expect to see an improvement.
More info about profiling: https://bazel.build/advanced/performance/json-trace-profile
I ran a few full rebuilds of my project:
-
Bazel 7.0.2 +
--experimental_sandbox_async_tree_delete_idle_threads=0
- Completes in 2254 seconds
- Output of your command: 20925239343
-
Bazel 7.1.0rc1 +
--experimental_sandbox_async_tree_delete_idle_threads=0
- Completes in 1920 seconds
- Output of your command: 7356170441
-
Bazel 7.1.0rc1
- Completes in 1858 seconds
- Output of your command: 5730664498
I'm not sure what the units Bazel's profiling is using, but I pasted the output at your request.
As far as I can tell, it shaved a few minutes off of my build. That's pretty good, considering I have a 1167 second critical path consisting of a clang compiler build + emscripten cache build. So I'm sure it helps the incremental builds even more. Nice work!
Thank you Jeremiah. That's still quite a bit of overhead but we'll add more improvements.