bazel-buildfarm
bazel-buildfarm copied to clipboard
Reject Concurrent Duplicate Writes with Abort Error Instead of Blocking
Problem
Buildfarm workers might receive another write request while one is already in progress for a digest with the exact same input. This situation arises primarily due to client-side timeouts, and the worker has not yet closed or canceled the initial request. Each write request has a unique ID, so these situations can only occur during retries.
Currently, if a worker receives a duplicate request, it blocks the second request until the first one is finished or canceled. During this blocked period, buildfarm-server continues to send data, which can accumulate in the socket buffer, significantly slowing down the worker. This situation commonly occurs at peak traffic, especially when the workers are experiencing significant CPU usage.
Solution
Instead of waiting for the first request to complete and blocking the second request, throw an Abort error for the second request immediately.
The premise of this change doesn't line up with the flow control - you can't write any bytes to the file without exclusivity. You cannot request new bytes from the client on the worker write stream without writing bytes. The server won't send more bytes to the worker without those new write requests for bytes. The only way you can get this circumstance is if the flow control is disabled.