yohplala
yohplala
Note: I modified previous comment / I removed what I think in my previous message was not adequate. The way of setting `write_fmd` from `Parquetfile.fn` attribute could be in a...
Hi, yes we can have an enum, I don't see a trouble with that. I would maybe rename/reword it this way? ```python class MDWriteMode(enum.Enum): ALL_META = 1 ONLY_COMMON = 2...
The "only" point I see that needs some discussion is what to do when appending a hive dataset when naming does not follow fastparquet's convention (as you also identified). I...
Hi Martin, Regarding your last comment, and not familiar at all with pyarrow, I made a test. I could not get to understand how you can tell it to write...
Appending to an existing parquet data set seems to corrupt it. ValueError: Seek before start of file
Hi @MosheVai As indicated by @martindurant 'With the code snippet as written, I think I would expect the second write to simply clobber the first.' More exactly this line in...
Appending to an existing parquet data set seems to corrupt it. ValueError: Seek before start of file
I just got a look at Dask API documentation, `to_parquet` [here](https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.to_parquet). It states that by default, it does not append. 'append:bool, optional If False (default), construct data-set from scratch. [...]'
> I think @yohplala fixed the issue with `None` names of multi-index levels recently. Hi, What I have solved for column multi index in the PR just recently merged is...
Hello, I believe this to be a behavior of fastparquet to be expected. When you filter, I am guessing that the index in the dataframe is not a range index...
Hi @martindurant , to be honest, I have no need for this, and am only able to code in spare time, few hours per week. So this will be a...
Hi @Paul424 What a coincidence :) Your ticket appears to me closely related to #676 And I am actually working on it at this very moment through PR #689 Applied...