azure-sdk-for-java icon indicating copy to clipboard operation
azure-sdk-for-java copied to clipboard

Storage Content Validation - Encoder Performance Improvements

Open ibrandes opened this issue 1 month ago • 1 comments

Summary

This PR updates StructuredMessageEncoder.java to return a reactive stream (Flux<ByteBuffer>) of encoded chunks rather than building and returning a single contiguous byte[]. The wire format (headers, segment layout, endianness, CRC fields) remains unchanged. The primary benefits are lower peak memory usage, improved throughput for large payloads, and better downstream flow control.

Key Changes in StructuredMessageEncoder.java

1) Public API: byte[]Flux<ByteBuffer>

  • Old: public byte[] encode(ByteBuffer unencodedBuffer) generated and returned a full byte[].
  • New: public Flux<ByteBuffer> encode(ByteBuffer unencodedBuffer)
  • Encoded chunks are produced lazily (e.g., Flux.defer(...)) and can be processed incrementally or collected when a contiguous buffer is required.

2) Reactive Error Signaling

  • Validation errors (e.g., idempotency violations, content-length bounds) now propagate via terminal stream errors (Flux.error(...)) instead of synchronous exceptions, aligning with reactive consumption patterns.

3) Emission Path

  • The encoder preserves the existing incremental layout (header → per-segment header/content/footer → footer) while emitting those parts directly as ByteBuffer items, avoiding a ByteArrayOutputStream and the final monolithic array allocation.

4) Wire-Format Consistency

  • Endianness: Numeric fields (segment number short, sizes/CRCs long) remain LITTLE_ENDIAN.
  • CRC64: When StructuredMessageFlags.STORAGE_CRC64 is set, segment footers and the message footer include CRC64 long values; otherwise, CRC fields are omitted as before.
  • Layout constants: Existing constants (e.g., V1_HEADER_LENGTH, V1_SEGMENT_HEADER_LENGTH, CRC64_LENGTH) and segment sizing logic are retained to ensure identical binary output.

Motivation

  • Lower Peak Memory: Avoids allocating a large contiguous byte[] for big payloads by streaming chunks.
  • Throughput & Backpressure: Downstream consumers can start processing as data is produced, improving end-to-end latency and memory pressure.

Tests

  • MessageEncoderTests.java are updated to collect the Flux<ByteBuffer> and assert the same structural, CRC, and error-case behaviors as before.

ibrandes avatar Dec 10 '25 18:12 ibrandes

API Change Check

APIView identified API level changes in this PR and created the following API reviews

com.azure:azure-storage-common

github-actions[bot] avatar Dec 10 '25 18:12 github-actions[bot]