Patrick Hoefler

Results 345 comments of Patrick Hoefler

Groupby doesn't support nunique approx, you can call nunique directly. Can you elaborate a bit why that isn't sufficient?

What Dask version are you on? We need more information to give advice here.

And where is Dask blowing up memory? Normally, we trigger a shuffle for nunique that will make sure that the partition count doesn't change, i.e. we should not allocate to...

closing this for now, please ping to reopen when you have an example where this is an issue

Thanks for your report. For context: we are basically using ``` df.groupby('a').b.sum() / df.groupby('a').b.count() ``` under the hood to compute the mean, which also fails. I agree this is not...

cc @jrbourbeau @mrocklin for thoughts

@mrocklin and I chatted offline. We will nuke the 10 minutes to dask page and add some of the content back in in a follow up We will move ahead...

The disadvantage with the tabs is that we can't link to them directly as far as I know (so I can't sent someone to the array getting started guide), which...

Doesn't the reproducer from the issue work as a test if you remove the training part?