google-cloud-go icon indicating copy to clipboard operation
google-cloud-go copied to clipboard

refactor(storage): Pipeline gRPC writes.

Open cjc25 opened this issue 6 months ago • 0 comments

Modify the gRPC writer to send additional data while waiting for the current chunk to flush. This is a substantial refactor.

Per the Go io.Writer interface contract, we must never modify or retain the slice that the caller provides to Write. However, that doesn't mean we have to copy every byte into a writer-controlled buffer: we can refer to the byte slice in place. Therefore, if callers call Write() with more bytes than the chunk size, we can send them to the service immediately as long as we don't return from Write() until we no longer need the caller's slice.

By sending data as soon as callers provide it, we get a substantial single-stream throughput increase for large objects. This is especially evident when callers provide large buffers to Write() calls.

There are two followup investigations made possible by this refactor. The first is to flush less frequently when the caller provides write slices much larger than the chunk size. This may provide an even larger throughput improvement when Write() is called with a large buffer, and is straightforward to implement.

The second is to flush more frequently when the caller provides write slices much smaller than the chunk size. (E.g. split a 16MiB chunk into 2x8MiB sub-chunks, and flush each when they're full.) This can avoid pipeline stalls in more scenarios, by increasing the likelihood that part of the chunk is available to buffer data without waiting for a flush acknowledgement.

cjc25 avatar Jun 07 '25 03:06 cjc25