orc icon indicating copy to clipboard operation
orc copied to clipboard

ORC-998: Sharing compression output buffer among treeWriter: Refactoring within OutStream for portability

Open autumnust opened this issue 3 years ago • 4 comments

What changes were proposed in this pull request?

There's individual instance of OutStream within each TreeWriter created by WriterContext#createStream method. Within OutStream, there are totally 3 buffers:

  • current: the regular input buffer holding uncompressed, unencrypted bytes.
  • compress: the output buffer holding compressed bytes
  • overflow: same as "compress" but only used when the last compression output is larger than remaining capacity of compress buffer.

Potentially the compress and overflow buffer don't have to be allocated individually within each OutStream object, but shared across all of them so to save memory allocation.

This PR is the first step for sharing the compression output buffer, which refactors internal of OutStream and make the relevant object bundled together since they are logically related(and details of dealing with overflow doesn't have to be visible). This refactoring makes it easier to share the compression output buffer as a pass-in arguments in the follow-up PR.

Why are the changes needed?

For the context of ORC-997, this change makes the compression output buffer, as a single entity, easier to be shared and passed in from caller.

How was this patch tested?

There's no functional changes from this PR so passing all existed unit tests. Also added a new test in TestOutStream to make sure that, the scenario where codec.compress() returns false (meaning the compression output is larger than original input) is also covered in the unit test.

autumnust avatar Sep 16 '21 00:09 autumnust

Thank you for making a PR, @autumnust .

dongjoon-hyun avatar Sep 16 '21 16:09 dongjoon-hyun

LGTM, Nit: TestOutStream.java line 82 has an extra space

guiyanakuang avatar Oct 29 '21 05:10 guiyanakuang

Gentle ping, @autumnust .

dongjoon-hyun avatar Jul 07 '22 17:07 dongjoon-hyun

Gentle ping, @autumnust .

dongjoon-hyun avatar Aug 09 '22 05:08 dongjoon-hyun

Is there interest in reviving this PR? Most work is done, just needs some absorb the comments. I can look into it

paliwalashish avatar Sep 23 '23 04:09 paliwalashish

You can take over while keeping his authorship as the first commit in your PR, @mystic-lama .

dongjoon-hyun avatar Sep 30 '23 18:09 dongjoon-hyun

@dongjoon-hyun Sent a PR #1633, not sure if it's the correct way. All commits are still owned by autumnust. Let me know it that's the correct way.

paliwalashish avatar Oct 02 '23 15:10 paliwalashish