yohplala comments

Results 63 comments of


                                            yohplala

Ability to drop a partition (hive partitioned format)

@Paul424 If you are interested in PR #689, you can clone the corresponding repo and give it a try. Beware that if you are accessing remote parquet dataset using `fsspec`,...

int dtype in a categorical column is lost when used as partition

No worries Martin! I feel it nice that my test cases (for another lib based on fastparquet) supplements those of fastparquet :).

append corrupts _metadata and _common_metadata files

> @yohplala , it's probably time we follow suit from dask, spark, arrow... and explicitly allow for write and append without creating the global _metadata. It could even be the...

Future Plans

Hi @martindurant I found out feature request #244 (adding new colum to existing file parquet). Do you think it could be revived? I am in the same need than those...

Future Plans

> Do you mean supporting schema evolution (ability to add columns or promote types in later files of a dataset) or actually writing a new column to an existing file?...

Future Plans

> > In my mind, there is thus no trouble adding new columns in new files > > I perhaps should have been clearer: this means that you have new...

Future Plans

> Sounds like a specialist merge function that doesn't need to live in the library. The concern is performance. As far as I understand fastparquet, in its current state, it...

Future Plans

> [...]. But it would not need to be part of the main API or even necessarily in this package. Bouncing on your comment :) I think there does be...

> What you want already exists. Example: > > ``` > import fsspec > fs = fsspec.filesystem("memory") > df.to_parquet("memory://temp.parq", engine="fastparquet") > ``` > > the output is at `fs.store['/temp.parq']` (the...

Future Plans

Thanks a lot for your reply @martindurant !