yohplala comments

Results 63 comments of


                                            yohplala

Make _metadata optional on writing

Note: I modified previous comment / I removed what I think in my previous message was not adequate. The way of setting `write_fmd` from `Parquetfile.fn` attribute could be in a...

Make _metadata optional on writing

Hi, yes we can have an enum, I don't see a trouble with that. I would maybe rename/reword it this way? ```python class MDWriteMode(enum.Enum): ALL_META = 1 ONLY_COMMON = 2...

Make _metadata optional on writing

The "only" point I see that needs some discussion is what to do when appending a hive dataset when naming does not follow fastparquet's convention (as you also identified). I...

Make _metadata optional on writing

Hi Martin, Regarding your last comment, and not familiar at all with pyarrow, I made a test. I could not get to understand how you can tell it to write...

Appending to an existing parquet data set seems to corrupt it. ValueError: Seek before start of file

Hi @MosheVai As indicated by @martindurant 'With the code snippet as written, I think I would expect the second write to simply clobber the first.' More exactly this line in...

Appending to an existing parquet data set seems to corrupt it. ValueError: Seek before start of file

I just got a look at Dask API documentation, `to_parquet` [here](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.to_parquet). It states that by default, it does not append. 'append:bool, optional If False (default), construct data-set from scratch. [...]'

yohplala

Make _metadata optional on writing

Make _metadata optional on writing

Make _metadata optional on writing

Make _metadata optional on writing

Appending to an existing parquet data set seems to corrupt it. ValueError: Seek before start of file

Appending to an existing parquet data set seems to corrupt it. ValueError: Seek before start of file

TypeError: assign() keywords must be strings

Incorrect roundtrip of index names on filtered dataframe

Incorrect roundtrip of index names on filtered dataframe

Ability to drop a partition (hive partitioned format)