arrow icon indicating copy to clipboard operation
arrow copied to clipboard

ARROW-17825: [C++] Allow the possibility to write several tables in ORCFileWriter

Open LouisClt opened this issue 2 years ago • 7 comments

I had the need to write an ORC file little by little, so as to not consume too much memory. Following this discussion, it appeared that the API did not seemed to prevent doing that, but that the internal implementation was not reusing the writer accordingly.

This PR makes the needed changes to reuse the "writer_" correctly.

I do not think that the preceding behaviour was correct, as calling several time the "Write" method would lead to incorrect ORC files.

LouisClt avatar Sep 23 '22 08:09 LouisClt

https://issues.apache.org/jira/browse/ARROW-17825

github-actions[bot] avatar Sep 23 '22 12:09 github-actions[bot]

:warning: Ticket has not been started in JIRA, please click 'Start Progress'.

github-actions[bot] avatar Sep 23 '22 12:09 github-actions[bot]

Yes I will do this once I made sure that the existing tests still pass.

LouisClt avatar Sep 23 '22 15:09 LouisClt

I added a test for checking the schema. I added a specific error, and checked that an error was found. (however I didn't test that the error was the specific one ).

LouisClt avatar Sep 28 '22 13:09 LouisClt

Good. I was wondering if we need a test for writing a RecordBatch, for the moment this is not tested. Is this ok without test ?

LouisClt avatar Oct 18 '22 07:10 LouisClt

Good. I was wondering if we need a test for writing a RecordBatch, for the moment this is not tested. Is this ok without test ?

Oh we would prefer to have a test for that.

wjones127 avatar Oct 18 '22 16:10 wjones127

I added a test for testing the write of RecordBatches with ORC. As the RecodBatch method just makes a call to the "Table" one, I think this is sufficient. The test failures that remains seem to be unrelated.

LouisClt avatar Oct 19 '22 13:10 LouisClt

Sorry I let this sit @LouisClt.

This looks good to me. @pitrou do you have any comments? Or can I merge this?

wjones127 avatar Nov 14 '22 21:11 wjones127

Benchmark runs are scheduled for baseline = 4d8c21bd303833c124f9d5801755c953c6c3260e and contender = 77f099fb5c324afc8ee38cda4976bf20a08e7a4a. 77f099fb5c324afc8ee38cda4976bf20a08e7a4a is a master commit associated with this PR. Results will be available as each benchmark for each run completes. Conbench compare runs links: [Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2 [Finished :arrow_down:1.58% :arrow_up:0.0%] test-mac-arm [Finished :arrow_down:0.27% :arrow_up:0.82%] ursa-i9-9960x [Finished :arrow_down:1.43% :arrow_up:0.03%] ursa-thinkcentre-m75q Buildkite builds: [Finished] 77f099fb ec2-t3-xlarge-us-east-2 [Finished] 77f099fb test-mac-arm [Finished] 77f099fb ursa-i9-9960x [Finished] 77f099fb ursa-thinkcentre-m75q [Finished] 4d8c21bd ec2-t3-xlarge-us-east-2 [Finished] 4d8c21bd test-mac-arm [Finished] 4d8c21bd ursa-i9-9960x [Finished] 4d8c21bd ursa-thinkcentre-m75q Supported benchmarks: ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True test-mac-arm: Supported benchmark langs: C++, Python, R ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot avatar Nov 16 '22 07:11 ursabot