orc icon indicating copy to clipboard operation
orc copied to clipboard

ORC-1711: [C++] Introduce a memory block size parameter for writer option

Open luffy-zh opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

  1. Add the memory block size parameter to the writer option, which initializing the compressed input buffer block size
  2. The compressed stream will retain the input buffer until the input buffer size reaches the compression block size, allowing the compressed stream to start with a minimal initial memory footprint.

Why are the changes needed?

This code segment distinguishes between the compression block size and the input buffer size to solve the issue.

How was this patch tested?

The UTs in TestCompression.cc and TestWriter.cc can cover this patch.

luffy-zh avatar May 13 '24 09:05 luffy-zh

@luffy-zh Thank you for making the PR. Could you explain why we need outputStream->getRawInputBufferSize() to record position? it seems that flushedSize and bufferPosition were't changed.

When we call recordPosition(), we need to record the output buffer length and input buffer length. We have multiple blocks in the input buffer, 'bufferPosition' only records the effective length of the last block, that's why we need to use getRawInputBufferSize() in the outputStream.

luffy-zh avatar May 14 '24 11:05 luffy-zh

@luffy-zh Thank you for making the PR. Could you explain why we need outputStream->getRawInputBufferSize() to record position? it seems that flushedSize and bufferPosition were't changed.

When we call recordPosition(), we need to record the output buffer length and input buffer length. We have multiple blocks in the input buffer, 'bufferPosition' only records the effective length of the last block, that's why we need to use getRawInputBufferSize() in the outputStream.

Make sense.

ffacs avatar May 14 '24 12:05 ffacs

@luffy-zh Thank you.

ffacs avatar May 22 '24 11:05 ffacs