buck2 icon indicating copy to clipboard operation
buck2 copied to clipboard

remote_execution: synchronize concurrent upload of identical artifacts

Open thoughtpolice opened this issue 10 months ago • 3 comments

Currently, upload requests are handled in parallel without knowledge of other ongoing requests. If multiple actions depend on the same set of large locally available artifacts, then they will all be uploaded at the same time. This is particularly poor behavior for large files.

Just store in-flight requests in a dashmap, and wait if an upload is already extant.

thoughtpolice avatar May 07 '25 03:05 thoughtpolice

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. (Because this pull request was imported automatically, there will not be any future comments.)

facebook-github-bot avatar May 07 '25 03:05 facebook-github-bot

This is an up-to-date refile of https://github.com/facebook/buck2/pull/750

thoughtpolice avatar May 07 '25 03:05 thoughtpolice

There is a subtle bug in this PR. All of these remote execution functions are being executed from tasks that can be canceled. When a second task gets a tokio::sync::watch::channel Receiver to wait on the outcome of the first task, you've got a problem: The first task may end up being canceled. I'm guessing that will cause the Sender side to drop and cause the Reciever to get a "channel closed" error. I think the best fix for this would probably be to used shared futures. https://docs.rs/futures/latest/futures/future/trait.FutureExt.html#method.shared

Another interesting observation here is that this PR only deduplicates larger blobs. Smaller blobs will still be uploaded multiple times from multiple ongoing tasks.

bvinc avatar Oct 15 '25 20:10 bvinc