HDDS-11043. Explore client retry optimizations after write() and hsync() are desynced
What changes were proposed in this pull request?
-
Introduced
RetryRequestBatcher, a sliding-window planner that keeps failed writeChunk requests sorted by end offset, retains only the most recent putBlock offset, and produces an optimized retry plan (combined chunk list + putBlock flag). -
Wired the batcher into
BlockOutputStream: every outgoing writeChunk/putBlock updates the window,writeOnRetrynow replays the optimized plan (piggybacking the final chunk when supported), and acknowledgements/clears shrink the window once putBlock succeeds. -
Added
TestRetryRequestBatcherto exercise the batching logic across basic, duplicate putBlock, acknowledgement, complex, and bookkeeping scenarios.
Benefit:
-
Shared setup: Every writeChunk/putBlock RPC now flows through
RetryRequestBatcher. On the happy path we track each write’s end-offset and the latest putBlock offset. If an RPC fails, the window already knows exactly which buffers still need to be retried and in what order; when a putBlock succeeds,acknowledgeUpTo(flushPos)removes all requests the datanodes have committed. -
Retry without piggyback:
- Old sequence:
writeOnRetryblindly replayed each allocated chunk, issuing awriteChunk, immediately followed by a standaloneputBlock. That meantnfailed chunks produced2nretry RPCs, even if multiple writes could be coalesced before the next metadata update. - New sequence: we call
retryRequestBatcher.optimizeForRetry(). This collapses all outstanding chunks into a single ordered list and keeps just the highest putBlock offset. The retry loop now issues each chunk exactly once and sends a singleputBlockat the end. Result: fewer network round-trips, less checksum/compression work, and shorter retry latency.
- Old sequence:
-
Retry with piggyback enabled:
- Before: we still replayed every chunk one-by-one, and each chunk triggered a piggybacked
writeChunkAndPutBlock, so we ended up sending a putBlock for every chunk in the window. - After: we write the combined chunk list sequentially; when we reach the last outstanding chunk, we piggyback the final putBlock on that single RPC (
writeChunkAndPutBlock). All preceding chunks are sent as plainwriteChunkcalls. Effectively we collapse the retries to “N chunk writes + 1 piggybacked flush” instead of “N piggybacked writes”, reducing both network chatter and datanode commit work while preserving the benefits of piggyback (no extra standalone putBlock).
- Before: we still replayed every chunk one-by-one, and each chunk triggered a piggybacked
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11043
How was this patch tested?
TestRetryRequestBatcher UT