parquet2 icon indicating copy to clipboard operation
parquet2 copied to clipboard

write readme and example

Open ratal opened this issue 2 years ago • 3 comments

Hi, The documentation describes well what can be done for reading. However, I am writing a crate to convert specific measurement file into parquet format ; There is no explanation on how to write a parquet file. I could find in the tests an implementation example but it is limited for my case. I do not know what should be most suitable parallelization for instance considering I already have data in column form in memory. I suppose documentation is not yet mature as parquet2 is still in active development and main usage of this crate is reading but it would help with more explanation.

ratal avatar Apr 26 '22 23:04 ratal

Hi ratal, The arrow2 crate has a good example if you use arrow2?

TurnOfACard avatar Apr 27 '22 04:04 TurnOfACard

Thanks, I could succeed to write a first proto link I had to remove statistics, difficult with dyn traits to input max and min values. I am trying to put the loop for each column in parallel with rayon but FileWriter seems to need an Arc<Mutex> Anyway thank again for pointing to documentation, you can keep opened the issue as reminder or close it. If I still have issues I open new ticket specifically.

ratal avatar Apr 28 '22 21:04 ratal

Cool!

For rayon, we have an example here: https://github.com/jorgecarleitao/arrow2/blob/main/examples/parquet_write_parallel/src/main.rs#L45

We currently only support an iterator (not a parallel iterator) of columns, which is why the columns must be pre-serialized before writing.

jorgecarleitao avatar Apr 28 '22 21:04 jorgecarleitao