parquet-python icon indicating copy to clipboard operation
parquet-python copied to clipboard

Writing parquet files

Open peterbe opened this issue 9 years ago • 4 comments

Hi, We need to be able to write python dicts to parquet. What are the chances that you'll have time to work on this? I.e. a writer class.

My team is totally new to parquet so we have a lot to learn. We did see https://github.com/jcrobak/parquet-python/pull/13 which claims to have a writer functionality but that PR is out-of-sync and tries to solve a couple of other things at the same time.

Would appreciate your thoughts on this project's near future.

cc @adngdb

peterbe avatar May 31 '16 14:05 peterbe

If you wish to write dicts, as opposed to tabular data, you may be better off looking at avro. There are working python libraries, avro (official, slow), fastavro and cyavro.

martindurant avatar Jun 03 '16 20:06 martindurant

My stats team say they want it stored in parquet (in S3). I have many individual big dicts that I want to store. Most of them are 1-level dicts, so it's quite tabular. All of it needs to happen from CPython, not a JVM.

peterbe avatar Jun 06 '16 12:06 peterbe

In that case, you have two options: to wait for the ongoing work by the apache-arrow to enable the conversion of pandas dataframes to parquet (so, presumably, any data structure you can store in a dataframe), or - of course - to work on the writer in this project. I personally have no plans to work on it in the near future.

martindurant avatar Jun 06 '16 12:06 martindurant

Thanks! I appreciate the update and tips. I'll try to get a handle on the state of Python support inside arrow. I see the code's there but skimming through it, I only see support (no idea of it's completion state) for readiing.

peterbe avatar Jun 06 '16 15:06 peterbe