remote_execution: synchronize concurrent upload of identical artifacts
Currently, upload requests are handled in parallel without knowledge of other ongoing requests. If multiple actions depend on the same set of large locally available artifacts, then they will all be uploaded at the same time. This is particularly poor behavior for large files.
Just store in-flight requests in a dashmap, and wait if an upload is already extant.
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. (Because this pull request was imported automatically, there will not be any future comments.)
This is an up-to-date refile of https://github.com/facebook/buck2/pull/750
There is a subtle bug in this PR. All of these remote execution functions are being executed from tasks that can be canceled. When a second task gets a tokio::sync::watch::channel Receiver to wait on the outcome of the first task, you've got a problem: The first task may end up being canceled. I'm guessing that will cause the Sender side to drop and cause the Reciever to get a "channel closed" error. I think the best fix for this would probably be to used shared futures. https://docs.rs/futures/latest/futures/future/trait.FutureExt.html#method.shared
Another interesting observation here is that this PR only deduplicates larger blobs. Smaller blobs will still be uploaded multiple times from multiple ongoing tasks.