bazel Remote Output Service: place bazel-out/ on a FUSE file system

Builds that yield large output files can generate lots of network bandwidth when remote execution is used. To combat this, we have flags such as --remote_download_minimal to disable downloading of output files. Unfortunately, this makes it hard to perform ad hoc exploration of build outputs.

In an attempt to solve this, this change adds an option to Bazel named --remote_output_service. When enabled, Bazel effectively gives up the responsibility of managing a bazel-out/ directory. Instead, it calls into a gRPC service to request a directory and creates a symlink that points to it.

Smart implementations of this gRPC service may use things like FUSE to let this replacement bazel-out/ directory be lazy-loading, thereby reducing network bandwidth significantly. In order to create lazy-loading files and directories, Bazel can call into a BatchCreate() RPC that takes a list of Output{File,Directory,Symlink} messages, similar to REv2's ActionResult. This call is also used to create runfiles directories by providing fictive instances of OutputSymlink.

To prevent Bazel from reading the contents of files stored in the FUSE file system (which would cause network I/O), the protocol offers a BatchStat() call that can return information such as the REv2 digest. Though this is redundant with --unix_digest_hash_attribute_name, there are a couple of reasons why I added this feature:

For non-Linux operating systems, it may make more sense to use NFSv4 instead of FUSE (i.e., running a virtual NFS daemon on localhost). Even though RFC 8276 adds support for extended attributes to NFSv4, not all operating systems implement it.
It addresses the security/hermeticity concerns that people raised when this feature was added. There is no way to add extended attributes to files that can't be tampered with (as a non-root user), while using gRPC solves that problem inherently.
Callers of Bazel's BatchStat.batchStat() may generate many system calls successively. This causes a large number of context switches between Bazel and the FUSE daemon. Using gRPC seems to be cheaper.

By requiring that the output path returned by the gRPC service is writable, no-remote actions can still run as before, both with sandboxing enabled and disabled. The only difference is that it will use space on the gRPC service side, as opposed to the user's home directory (though the gRPC service may continue to store data in the user's home directory.

I have a server implementation is written in Go on top of Buildbarn's storage and file system layer. My plan is to release the code for this service as soon as I've got a 'thumbs up' on the overall approach.

Jan 13 '21 15:01 EdSchouten

Initially my plan was to try to get the protocol that Bazel will speak with the FUSE daemon be standardized as part of the remote-apis project. Mailing list discussion:

https://groups.google.com/g/remote-execution-apis/c/qOSWWwBLPzo

After discussions that took place during the monthly meeting, and after taking another look at some of the (remotely) similar protocols in this space, such as Buildgrid's local_casd protocol, I think it would make more sense to keep a protocol like this in the Bazel tree, so that the protocol remains minimal.

I have not yet invested time in adding unit/integration testing coverage for this code. This is something that I'd rather do after first having some discussion here. For example, what do you think of the overall approach? What about the protocol? I'd love to get your feedback!

Cc @janakdr @bergsieker @ulfjack

Jan 13 '21 15:01 EdSchouten

cc @susinmotion @alexjski

Jan 13 '21 16:01 janakdr

cc @tetromino

Jan 13 '21 23:01 rupertks

cc @coeuvre

Jan 14 '21 07:01 meisterT

@lberki if needed, please reassign to someone that could shepherd this PR into Google?

Jan 15 '21 05:01 jin

I've just rebased this change on top of latest master, and made CI happy once again. Earlier today I released my implementation of the FUSE daemon on GitHub as well:

https://github.com/buildbarn/bb-clientd

:tada: :tada: :tada:

Be sure to let me know what I can do from my end to get this change polished up, upstreamed, etc. etc. etc.

Feb 03 '21 15:02 EdSchouten

@alexjski could you take the lead on this? Thanks!

Feb 03 '21 15:02 janakdr

/cc @philwo

Feb 03 '21 15:02 lberki

This is amazing 🥳

Feb 04 '21 20:02 sohaibiftikhar

As a general comment, a design doc would be useful so that everyone can understand more easily what's going on. We even have a separate repository for those:

https://github.com/bazelbuild/proposals

Apart from that procedural note, my main worry is that this adds Yet Another Way for Bazel to handle the metadata pertaining to action outputs, of which we already have too many (objfsd, the internal equivalent of this FUSE service, ActionFS, the output tree under bazel-out/, the local action cache, the disk cache, etc.

@philwo is probably the most competent person to comment on all these things. I personally don't feel confident saying anything about this change other than "pls no I don't need more abstractions", which is not exactly helpful. At the very least, it should be written down somewhere how this proposal interacts with these other metadata handling systems.

Feb 08 '21 08:02 lberki

As a general comment, a design doc would be useful so that everyone can understand more easily what's going on. We even have a separate repository for those:

https://github.com/bazelbuild/proposals

Done! https://github.com/bazelbuild/proposals/pull/211

Feb 10 '21 14:02 EdSchouten

Thanks a lot :)

Feb 10 '21 15:02 lberki

My start into this year hasn't exactly been great unfortunately, so I have a backlog of things I have to go through first. I won't have time to check this out this week, but please ping me if I still haven't replied here by mid of next.

Feb 10 '21 15:02 philwo

Friendly ping! (Also rebased my change.)

Feb 22 '21 17:02 EdSchouten

Hi @alexjski, I've just rebased my PR and addressed most of the comments that have been provided so far. PTAL!

Apr 29 '21 14:04 EdSchouten

@EdSchouten, do you have a fresh rebase for this patch?

Aug 25 '21 11:08 moroten

FWIW / if it is relevant, I am working on adding setxattr support to the current DiskCacheClient for macOS / Linux native filesystems (using unix_digest_hash_attribute_name).

I was wondering what the use cases were for getxattr without setxattr, I assume maybe that's covered somehow in the FUSE filesystem used by Google folks? I should wrap this up sometime in the next few days, but the WIP is here.

https://github.com/bazelbuild/bazel/compare/master...ButterflyNetwork:setxattr?expand=1

Dec 19 '21 14:12 jgavris

I was wondering what the use cases were for getxattr without setxattr, I assume maybe that's covered somehow in the FUSE filesystem used by Google folks?

This is it exactly, the xattrs are a way for the FUSE system to communicate metadata about files with Bazel (example here). This is why we actually never need to set those -- what is you use for it?

Dec 21 '21 23:12 alexjski

@alexjski We don't use FUSE / sandboxfs currently, just APFS / EXT4. Could you point me to how bazel interaction with FUSE / sandboxfs knows in which hashAttributeName to store the digest? The only usages I could find for the startup option unixDigestHashAttributeName are for getxattr, which is why I started up this branch. Maybe that's internal / in blaze?

Dec 22 '21 23:12 jgavris

@alexjski We don't use FUSE / sandboxfs currently, just APFS / EXT4. Could you point me to how bazel interaction with FUSE / sandboxfs knows in which hashAttributeName to store the digest? The only usages I could find for the startup option unixDigestHashAttributeName are for getxattr, which is why I started up this branch. Maybe that's internal / in blaze?

It is internal. It is a feature of our FUSE that it exposes this metadata as an xattr. In the example I showed, if the xattr is present, we would read it instead of digesting the file.

FUSE is really well positioned to expose information like that since it also knows about all of the writes. Bazel does not do that. In particular, setting xattrs by hand can lead to wrong digests if the file changes from unaccounted modifications:

$ echo hello /tmp/a.txt
$ setfattr -n user.digest -v 123 /tmp/a.txt
$ echo there > /tmp/a.txt
$ getfattr -n user.digest /tmp/a.txt
getfattr: Removing leading '/' from absolute path names
# file: tmp/a.txt
user.digest="123"

Ideally, we would at least clear/update such xattr on a change, but that is up to the FS to do that. I guess for Bazel execution, that should be less of a problem since we delete files before we run actions, therefore updates by locally executed actions (local/standalone strategy vs sandboxed) should be generally fine (other than ill-behaved ones e.g. writing over inputs, but those are problematic already since they potentially corrupt anything else reading the input). I guess that would put it at a similar risk level to running with --noexperimental_check_output_files.

Dec 23 '21 00:12 alexjski

@alexjski I see, thanks. So you have a particular attribute and configure your startup options there. Do you also assume SHA-256 for the hash? That's been the default for a while, but's still configurable in Bazel. Just curious about the coupling between Blaze and SandboxFS.

We'd assume that nobody touches the local disk cache (AC / CAS files) besides bazel, and we can run --noexperimental_check_output_files and get the full speed boost!

Dec 23 '21 00:12 jgavris

I'll see if I can get my PR up in the next day or two, I might ping you as a reviewer if that is recommended.

Dec 23 '21 00:12 jgavris

cc @coeuvre

Dec 23 '21 06:12 meisterT

@EdSchouten @coeuvre Is this something that we still want to land? Still seems important for BwtB.

Jul 13 '22 13:07 brentleyjones

Any updates to this PR? And which bazel version could potentially have this merged in?

For some context, I’ve been evaluating —remote_download_symlink_template to reduce disk consumption on the client machine that calls bazel. It looks from the design proposal in this PR the stated —remote_download_symlink_template has edge cases that isn’t solved by the flag (eg, dynamic linked binary) so is remote output service designed to cover those problems? @EdSchouten

Sep 25 '22 22:09 leakingtapan

Eternal backlog; I don't have any time right now to get this polished up. What I did do, was update this branch/PR to be on top of Bazel 5.1.1.

Cc @joeljeske

Oct 11 '22 13:10 EdSchouten

Hey! It's a miracle! I've finally found some time to rebase this change on top of master. I have also processed many of the comments left behind by @alexjski.

@alexjski Even though a lot of time has passed, could I persuade you to review it once more? Thanks!

Oct 18 '22 14:10 EdSchouten

It would be really nice if we could get this in for 6.0 🙏.

Oct 18 '22 14:10 brentleyjones

What is the review status on this PR?

Dec 15 '22 12:12 moroten

Hey @alexjski,

Code shepherding--I some amount of code hygiene comments to be made (didn't add them all). This is just a warning that this will come, I assume you were waiting to resolve bigger issues before we go there. Anyway, I would recommend going over the warnings e.g. Intellij adds automatically for this change at some point.

Yeah, that's to be expected. I'm not really a Java programmer, so I'm not really up to date on what the recommended set of tooling is. Out of curiosity, is there some kind of lint target/script in the Bazel repository that I can run to print such warnings, as opposed to installing IntelliJ?

Jan 03 '23 10:01 EdSchouten

bazel bazel copied to clipboard

Remote Output Service: place bazel-out/ on a FUSE file system

bazel
bazel copied to clipboard