parquet2
parquet2 copied to clipboard
write readme and example
Hi, The documentation describes well what can be done for reading. However, I am writing a crate to convert specific measurement file into parquet format ; There is no explanation on how to write a parquet file. I could find in the tests an implementation example but it is limited for my case. I do not know what should be most suitable parallelization for instance considering I already have data in column form in memory. I suppose documentation is not yet mature as parquet2 is still in active development and main usage of this crate is reading but it would help with more explanation.
Hi ratal, The arrow2 crate has a good example if you use arrow2?
Thanks, I could succeed to write a first proto link I had to remove statistics, difficult with dyn traits to input max and min values. I am trying to put the loop for each column in parallel with rayon but FileWriter seems to need an Arc<Mutex> Anyway thank again for pointing to documentation, you can keep opened the issue as reminder or close it. If I still have issues I open new ticket specifically.
Cool!
For rayon, we have an example here: https://github.com/jorgecarleitao/arrow2/blob/main/examples/parquet_write_parallel/src/main.rs#L45
We currently only support an iterator (not a parallel iterator) of columns, which is why the columns must be pre-serialized before writing.