pystore icon indicating copy to clipboard operation
pystore copied to clipboard

item.append doesn't pass npartitions to item.write

Open JugglingNumbers opened this issue 5 years ago • 3 comments

When using item.append(item, new_data, npartitions=35) the write function is passed npartitions = None. Should be npartitions=npartitions https://github.com/ranaroussi/pystore/blob/40de1d51236fd6b6b88909c83dc6d7297de4b471/pystore/collection.py#L194-L196

JugglingNumbers avatar Feb 20 '20 15:02 JugglingNumbers

Hmm...changing this just result in a similar exception at line 182. I've changed it into:

new_npartitions = npartitions
if new_npartitions is None:
    memusage = data.memory_usage(deep=True).sum()
    new_npartitions = int(1 + memusage // DEFAULT_PARTITION_SIZE)

# combine old dataframe with new
current = self.item(item)
new = dd.from_pandas(data, npartitions=new_npartitions)

That seems to work at a first glance.

ancher1912 avatar Mar 12 '20 20:03 ancher1912

@ancher1912 yup you've encountered the other append error: https://github.com/ranaroussi/pystore/issues/31

The other option is just to change the last line of your blurb to new= dd.from_pandas(data, npartitions=1

since the combined dask dataframe is partitioned by the variable npartitions it doesn't matter if we only use one partition when converting the new dataframe to dask.

JugglingNumbers avatar Mar 13 '20 01:03 JugglingNumbers

Yeah, you're right. I've send a pull request to @ranaroussi with you're proposed changed. At least I can continue doing what I was doing before I did an update of Dask and FastParquet.

ancher1912 avatar Mar 13 '20 08:03 ancher1912