Matthew Rocklin
Matthew Rocklin
cc @mingwandroid
3.6 On Wed, May 30, 2018 at 1:19 AM, Matti Picus wrote: > What python are you using? See this issue in cpython, which was fixed for > python 3.3...
> By the way, an orthogonal possibility would be for Anaconda to link its Python builds with an alternative malloc library such as jemalloc. Perhaps that would improve the numbers,...
In conversation with @njsmith he raised a concern with the idea that Numpy might use a different malloc implementation than Python. Python might allocate some memory and then pass that...
This might happen in Dask when we get bytes from a socket and use those to create a Numpy array.
I don't know of any current work on this, but it would be in scope if anyone wants to start work on it. On Thu, Jan 10, 2019 at 5:41...
Yes On Thu, Jan 10, 2019 at 11:59 AM Pierre-Bartet wrote: > Isn't it done somewhere in dask.dataframe.set_index ? I'll have a look. > > — > You are receiving...
> Isn't there a way to delay such a two (or many) stages computation so that from a user point of view set_index (or sort) is just a single lazy...
Short term I recommend that we improve the situation by defining a `DataFrame.__len__` method that calls len on one of its' columns. This should cheaply improve things in a common...
I'd also like to sort partitions by max value of the partition column, but couldn't find an easy way to get statistics out of the metadata