Implement missing parts of OSS RE caching
This enables action caching on OSS builds. There were two missing pieces:
- write_action_result, for writing to the action cache. This was the major piece; without it, all you have is a bunch of uploaded CAS blobs for various things, but no way to find them again from the action digest. Now that the action results are uploaded as well, buck can find them when it's querying for cached actions based on the calculated digest alone.
- upload_blob was a less-exercised piece of code, it was only used for uploading stdout and stderr, and even then, only if they were more than 50KiB. I have added a test for it, but not sure if
internalis the right one for it to be in, or when these tests get run.
I've successfully used this implementation with BuildBuddy, both from Linux (using remote_enabled = True), and from macOS, with remote_enabled = False but caching still enabled. I have only tested it with RBE v2.3+ output_paths mode.
As an implementation note, I found it very difficult to even figure out that these parts weren't implemented. Buck2 was just saying "0% cache hits mate" and reporting that every re_action_cache (= attempts to fetch a cached action, not attempts to cache an executed action!) had failed. None of these failures were logged anywhere. I don't know what the appropriate mechanism to log this stuff is, but I whacked a tracing::warn for failed action result cache writes in there.
The TODOs around symlinking are related to https://github.com/facebook/buck2/issues/222
Having upload blob take a digest shouldn't be a big deal since I think the amount of data flowing through this API isn't big enough to matter. I can see why this is more convenient and matches the upload API better.
However, once we are going through that path, maybe we should just delete upload_blob entirely and reuse upload? That already takes a InlinedBlobWithDigest so then we could have fewer functions?
However, once we are going through that path, maybe we should just delete
upload_blobentirely and reuseupload? That already takes aInlinedBlobWithDigestso then we could have fewer functions?
There is a lot of "code overhead" for each piece of GRPC client API you want to expose. There are about 5 layers of nesting that, for fn upload_blob, add exactly nothing at each layer. Each layer basically does "return this.field.upload_blob(...)". So it would certainly be economical to delete one of those function stacks. However I imagine there are opportunities to collapse the nesting and make it less costly to maintain in the first place. I might report back on that.
Some of these layers of forwarding are so we can inject the internal closed source RE client (closed source because it uses a lot of systems that aren't open source, rather than because it has magical properties). They might need to stay (or might not).
Is there anything I can do here to help push this to being landed? I'm happy to finish the PR if you'd like @cormacrelf
Hi @benbrittain, I haven't had any time to work on this recently. I've given you push access to my fork cormacrelf/buck2, so you can keep working on the rbe-writes branch.
Is there anything I can do here to help push this to being landed? I'm happy to finish the PR if you'd like @cormacrelf
@cormacrelf @benbrittain any update on this PR? This might help our RE execution efficiency, and I can help if no one is working on it.
Hey @miaoyipu! It's on my near term TODO but I'm certainly not against you picking it up. If you do so though, let me know :)
Hi @cormacrelf
I've successfully used this implementation with BuildBuddy, both from Linux (using remote_enabled = True), and from macOS, with remote_enabled = False but caching still enabled. I have only tested it with RBE v2.3+ output_paths mode.
I am also trying to use this with Buildbuddy and disabled remote_enabled on Linux but I cannot get it to work. Could you share your config so I can compare that to mine? Thank you!
\edit: Never mind, it's working now. Thank you for the implementation @cormacrelf!