bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Add option to track sandbox stashes in memory

Open oquenchil opened this issue 1 year ago • 0 comments

This change introduces the flag --experimental_inmemory_sandbox_stashes to track in memory the contents of the stashes stored with --reuse_sandbox_directories=true

With the old behavior Bazel has to perform a lot of I/O to read the contents of each stash before reusing it in a new action. Essentially, it checks every directory and subdirectory in the stashed sandbox to find out which files are different to the inputs of the new action about to be executed.

With in-memory stashes we associate each stash to the symlinks that needed to be created for that action. We also store the time at which these symlinks were created. In a background thread after the action has finished executing we stat every directory and for the ones that have changed (this should be rare) we update the contents. Because we only read the contents of the directories that have changed we do much less I/O than before.

If an action purposefully changes a sandbox symlink in-place, this won't be detected by statting the directory. I don't know any use case for this since the symlink itself is an implementation detail to achieve sandboxing. For this reason, manipulation of sandbox symlinks in-place is not supported.

Depending on the build this change might have a significant effect on memory. It should generally improve wall time further in builds where --reuse_sandbox_directories already improved wall time.

This change also introduces a minor optimization which is to associate each stash with the target that it was originally created for. When a new action wants to reuse a stash and there is more than one available, it will take the stash whose target is closest to its own. This is done with the assumption that targets that are closer together in the graph will have more inputs in common.

Fixes #22309 .

oquenchil avatar May 27 '24 20:05 oquenchil