bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Asynchronous sandbox cleanup leads to inode exhaustion with defaulted --experimental_sandbox_async_tree_delete_idle_threads=4

Open jeremiahar opened this issue 7 months ago • 20 comments

Description of the bug:

It seems that Bazel does not clean up the sandbox directories fast enough, which eventually leads to running out of inodes when performing a build.

Which category does this issue belong to?

Local Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I'm not sure what the simplest way to reproduce this bug is. It might be because I am running a 96 core machine with

build --jobs=92 --local_cpu_resources=92 --sandbox_base=/dev/shm

and lots of symlinks from c++ compilation with tmpfs. The documentation hints that deletion threads are limited to 1 during the build which is probably the problem.

Which operating system are you running Bazel on?

Linux 6.7.0-arch3-1 #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 x86_64 GNU/Linux

What is the output of bazel info release?

release 7.0.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Probably d169556e1630c1ea2a670b8ecf405141356a504e which changed the default of experimental_sandbox_async_tree_delete_idle_threads to 4. I did not actually run the bisection, but I will note that 7.0.0-pre.20231011.2 did not have this problem and 7.0.1 does have the issue.

Have you found anything relevant by searching the web?

https://github.com/bazelbuild/bazel/issues/19266#issuecomment-1682828971

Any other information, logs, or outputs that you want to share?

It fails with an error message such as:

ERROR: /home/jeremiah/projects/REDACTED/BUILD:3:18: Compiling REDACTED.cpp failed: I/O exception during sandboxed execution: /dev/shm/bazel-sandbox.215dc078d692e667b99e0907ffd4a9aa9a5f6df28dfed5a7e2f2dcdc3fa54f19/linux-sandbox/8034/execroot/_main/external/boost/boost/metaparse/v1/error (No space left on device)

Inodes after failure:

Filesystem Inodes IUsed IFree IUse% Mounted on tmpfs 95M 95M 1.7K 100% /dev/shm

Inodes after manually removing sandbox directory:

Filesystem Inodes IUsed IFree IUse% Mounted on tmpfs 95M 543 95M 1% /dev/shm

I am using --sandbox_base=/dev/shm which is mounted as tmpfs.

Filesystem Size Used Avail Use% Mounted on tmpfs 378G 746M 377G 1% /dev/shm

After the build fails, the sandbox directory is slowly cleaned up, unless bazel is shutdown, which is what originally alerted me to the cause.

It seems that this was probably caused by d169556e1630c1ea2a670b8ecf405141356a504e which enabled asynchronous sandbox cleanup by changing the default of experimental_sandbox_async_tree_delete_idle_threads to 4.

I confirmed that adding experimental_sandbox_async_tree_delete_idle_threads=0 fixes the issue for me.

jeremiahar avatar Jan 20 '24 21:01 jeremiahar

Cc @oquenchil

fmeum avatar Jan 20 '24 22:01 fmeum

We experienced the same thing with JS tests with big runfiles trees and a /dev/shm sandbox base

dzbarsky avatar Jan 21 '24 22:01 dzbarsky

Experiencing same issue

ewianda avatar Jan 23 '24 00:01 ewianda

@bazel-io flag

fmeum avatar Jan 23 '24 08:01 fmeum

@sgowroji This sounds more like an issue for Team-Local-Exec as it isn't language-specific, but related to sandboxing in general. I mentioned @oquenchil since he was involved in the flag "flip".

fmeum avatar Jan 23 '24 08:01 fmeum

@bazel-io fork 7.1.0

iancha1992 avatar Jan 23 '24 18:01 iancha1992

@iancha1992 This may need to be picked into a 7.0.2 depending on assessed severity.

fmeum avatar Jan 23 '24 18:01 fmeum

How many inodes are we talking about? I can't tell from the error which number is the inodes.

Depending on how widespread this is we have two options, if few users are affected they can pass -- experimental_sandbox_async_tree_delete_idle_threads=0. If it's more widespread we should un-flip the flag and make the default 0.

The long term solution would be to keep track of how many inodes are free and block before planting any more symlinks. I won't rush that solution for 7.0.2 though.

oquenchil avatar Jan 23 '24 20:01 oquenchil

The number of inodes on /dev/shm is 95M. Before removing the sandbox directory, there were 1700 free (it's not 0 because Bazel cleans some up before I check the number.) After manually removing the sandbox directory, there are 95M free.

jeremiahar avatar Jan 23 '24 23:01 jeremiahar

It takes around 65 seconds to run out of inodes when the build is running. Once the build fails, if I just leave Bazel alone to clean up the sandbox itself, it takes around 554 seconds to clean up the sandbox with --experimental_sandbox_async_tree_delete_idle_threads not mentioned in bazelrc (so defaulted to 4 at version 7.0.1).

jeremiahar avatar Jan 23 '24 23:01 jeremiahar

In general it seems off to me that we generate inodes based on jobs (typically CPU count) but don't scale cleanup based on CPUs. So the more powerful your machine, the easier it is to have creation outpace deletion, right?

On Tue, Jan 23, 2024, 6:44 PM Jeremiah Rodriguez @.***> wrote:

It takes around 65 seconds to run out of inodes when the build is running. Once the build fails, if I just leave Bazel alone to clean up the sandbox itself, it takes around 554 seconds to clean up the sandbox with --experimental_sandbox_async_tree_delete_idle_threads not mentioned in bazelrc (so defaulted to 4 at version 7.0.1).

— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/20965#issuecomment-1907099709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL6JETCT4UMCCSCTVVLSFDYQBDM5AVCNFSM6AAAAABCDPDUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGA4TSNZQHE . You are receiving this because you commented.Message ID: @.***>

dzbarsky avatar Jan 24 '24 01:01 dzbarsky

In general it seems off to me that we generate inodes based on jobs (typically CPU count) but don't scale cleanup based on CPUs. So the more powerful your machine, the easier it is to have creation outpace deletion, right? On Tue, Jan 23, 2024, 6:44 PM Jeremiah Rodriguez @.> wrote: It takes around 65 seconds to run out of inodes when the build is running. Once the build fails, if I just leave Bazel alone to clean up the sandbox itself, it takes around 554 seconds to clean up the sandbox with --experimental_sandbox_async_tree_delete_idle_threads not mentioned in bazelrc (so defaulted to 4 at version 7.0.1). — Reply to this email directly, view it on GitHub <#20965 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL6JETCT4UMCCSCTVVLSFDYQBDM5AVCNFSM6AAAAABCDPDUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGA4TSNZQHE . You are receiving this because you commented.Message ID: @.>

Agreed. With tmpfs, there would be a cpu bottleneck from in memory operations. So the rate of symlink creation during the build will scale up to the cpu and kernel limits, but the rate of symlink deletion during the build is capped at 1 core (according to the documentation).

The docs state:

[--experimental_sandbox_async_tree_delete_idle_threads] If 0, delete sandbox trees as soon as an action completes (causing completion of the action to be delayed). If greater than zero, execute the deletion of such threes on an asynchronous thread pool that has size 1 when the build is running and grows to the size specified by this flag when the server is idle.

jeremiahar avatar Jan 24 '24 01:01 jeremiahar

The reason you give sounds logical. A single deletion thread cannot keep up.

Something else I was wondering, do you have --reuse_sandbox_directories enabled? That should in theory reduce the number of deletions necessary. It was also enabled by default for 7.0 so I guess you do unless you disabled it explicitly.

I still don't have a sense of how much of an edge case your set up is in order to evaluate whether the flag should be unflipped immediately or have users affected turn it off by themselves. I can start working from today on a proper fix but I don't want this to be blocking the release.

Would this affect most JS users?

oquenchil avatar Jan 30 '24 11:01 oquenchil

@oquenchil The 7.0.2 release has already happened, so there is no risk of blocking it. :⁠-⁠)

fmeum avatar Jan 30 '24 11:01 fmeum

The reason you give sounds logical. A single deletion thread cannot keep up.

Something else I was wondering, do you have --reuse_sandbox_directories enabled? That should in theory reduce the number of deletions necessary. It was also enabled by default for 7.0 so I guess you do unless you disabled it explicitly.

I still don't have a sense of how much of an edge case your set up is in order to evaluate whether the flag should be unflipped immediately or have users affected turn it off by themselves. I can start working from today on a proper fix but I don't want this to be blocking the release.

Would this affect most JS users?

I do not have --reuse_sandbox_directories disabled. Perhaps the others can add some additional information to give a sense of the scope of the problem.

jeremiahar avatar Jan 30 '24 11:01 jeremiahar

We also have the reuse flag enabled. (As an aside, I am a little confused by the sandbox reuse. It works per-mnemonic but all tests share one so it seems easy to ping-pong a sandbox between different runfiles trees, potentially with little overlap. I'd imagine it works better for build actions where you have different mnemonics and a large toolchain gets consistently reused for every action with given mnemonic.)

dzbarsky avatar Feb 01 '24 05:02 dzbarsky

between different runfiles trees,

Coincidentally I realized this yesterday evening and I'm implementing a fix now.

There is another issue in the current code. The second parameter here should be the sandbox base. Otherwise in cases like yours where you pass a different filesystem to the sandbox base, the stashing becomes copying.

Once I get these two changes in, the execution should be much faster apart from fixing the inode exhaustion problem.

oquenchil avatar Feb 01 '24 08:02 oquenchil

Any update? Should this still be a release blocker for 7.1?

meteorcloudy avatar Feb 13 '24 15:02 meteorcloudy

Almost done. Is this the last blocker?

oquenchil avatar Feb 13 '24 16:02 oquenchil

Nope, there are still a number of them, I'm just doing a round of review.

meteorcloudy avatar Feb 13 '24 16:02 meteorcloudy

A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

iancha1992 avatar Feb 22 '24 21:02 iancha1992

I tested a full rebuild of my project with Bazel 7.1.0 RC1. The result is that the build is actually able to complete now without specifying --experimental_sandbox_async_tree_delete_idle_threads=0. The inode usage gets up to 51%, but at least they are not fully exhausted now before the build completes. For comparison, when disabling async tree deletion, the inode usage only gets up to around 12%. So the situation has improved a lot, at least in my use case.

jeremiahar avatar Feb 23 '24 04:02 jeremiahar

Glad to hear Jeremiah. Could you please comment on whether the build/tests got any faster? With the reuse changes it should have got much faster so it'd be disappointing to hear it didn't.

oquenchil avatar Feb 23 '24 09:02 oquenchil

You can build/test with --profile=/tmp/profile.gz.

Then get the time with:

zcat /tmp/profile.gz | jq '.traceEvents | .[] | select(.name == "sandbox.createFileSystem") | .dur' | paste -sd+ - |  bc	

At least there I'd expect to see an improvement.

More info about profiling: https://bazel.build/advanced/performance/json-trace-profile

oquenchil avatar Feb 23 '24 10:02 oquenchil

I ran a few full rebuilds of my project:

  • Bazel 7.0.2 + --experimental_sandbox_async_tree_delete_idle_threads=0

    • Completes in 2254 seconds
    • Output of your command: 20925239343
  • Bazel 7.1.0rc1 + --experimental_sandbox_async_tree_delete_idle_threads=0

    • Completes in 1920 seconds
    • Output of your command: 7356170441
  • Bazel 7.1.0rc1

    • Completes in 1858 seconds
    • Output of your command: 5730664498

I'm not sure what the units Bazel's profiling is using, but I pasted the output at your request.

As far as I can tell, it shaved a few minutes off of my build. That's pretty good, considering I have a 1167 second critical path consisting of a clang compiler build + emscripten cache build. So I'm sure it helps the incremental builds even more. Nice work!

jeremiahar avatar Feb 23 '24 23:02 jeremiahar

Thank you Jeremiah. That's still quite a bit of overhead but we'll add more improvements.

oquenchil avatar Feb 26 '24 20:02 oquenchil