ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-11043. Explore client retry optimizations after write() and hsync() are desynced

Open peterxcli opened this issue 2 months ago • 1 comments

What changes were proposed in this pull request?

  • Introduced RetryRequestBatcher, a sliding-window planner that keeps failed writeChunk requests sorted by end offset, retains only the most recent putBlock offset, and produces an optimized retry plan (combined chunk list + putBlock flag). image

  • Wired the batcher into BlockOutputStream: every outgoing writeChunk/putBlock updates the window, writeOnRetry now replays the optimized plan (piggybacking the final chunk when supported), and acknowledgements/clears shrink the window once putBlock succeeds.

  • Added TestRetryRequestBatcher to exercise the batching logic across basic, duplicate putBlock, acknowledgement, complex, and bookkeeping scenarios.

Benefit:

  • Shared setup: Every writeChunk/putBlock RPC now flows through RetryRequestBatcher. On the happy path we track each write’s end-offset and the latest putBlock offset. If an RPC fails, the window already knows exactly which buffers still need to be retried and in what order; when a putBlock succeeds, acknowledgeUpTo(flushPos) removes all requests the datanodes have committed.

  • Retry without piggyback:

    • Old sequence: writeOnRetry blindly replayed each allocated chunk, issuing a writeChunk, immediately followed by a standalone putBlock. That meant n failed chunks produced 2n retry RPCs, even if multiple writes could be coalesced before the next metadata update.
    • New sequence: we call retryRequestBatcher.optimizeForRetry(). This collapses all outstanding chunks into a single ordered list and keeps just the highest putBlock offset. The retry loop now issues each chunk exactly once and sends a single putBlock at the end. Result: fewer network round-trips, less checksum/compression work, and shorter retry latency.
  • Retry with piggyback enabled:

    • Before: we still replayed every chunk one-by-one, and each chunk triggered a piggybacked writeChunkAndPutBlock, so we ended up sending a putBlock for every chunk in the window.
    • After: we write the combined chunk list sequentially; when we reach the last outstanding chunk, we piggyback the final putBlock on that single RPC (writeChunkAndPutBlock). All preceding chunks are sent as plain writeChunk calls. Effectively we collapse the retries to “N chunk writes + 1 piggybacked flush” instead of “N piggybacked writes”, reducing both network chatter and datanode commit work while preserving the benefits of piggyback (no extra standalone putBlock).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11043

How was this patch tested?

TestRetryRequestBatcher UT

peterxcli avatar Oct 24 '25 17:10 peterxcli