bazel-buildfarm icon indicating copy to clipboard operation
bazel-buildfarm copied to clipboard

Reject Concurrent Duplicate Writes with Abort Error Instead of Blocking

Open amishra-u opened this issue 2 years ago • 1 comments

Problem

Buildfarm workers might receive another write request while one is already in progress for a digest with the exact same input. This situation arises primarily due to client-side timeouts, and the worker has not yet closed or canceled the initial request. Each write request has a unique ID, so these situations can only occur during retries.

Currently, if a worker receives a duplicate request, it blocks the second request until the first one is finished or canceled. During this blocked period, buildfarm-server continues to send data, which can accumulate in the socket buffer, significantly slowing down the worker. This situation commonly occurs at peak traffic, especially when the workers are experiencing significant CPU usage.

Solution

Instead of waiting for the first request to complete and blocking the second request, throw an Abort error for the second request immediately.

amishra-u avatar Sep 29 '23 18:09 amishra-u

The premise of this change doesn't line up with the flow control - you can't write any bytes to the file without exclusivity. You cannot request new bytes from the client on the worker write stream without writing bytes. The server won't send more bytes to the worker without those new write requests for bytes. The only way you can get this circumstance is if the flow control is disabled.

werkt avatar Oct 02 '23 22:10 werkt