Florian Jetter
Florian Jetter
> and their output (a bunch of None's) `None` is a singleton. If all tasks are returning `None`, this will not add to the memory blowup. The `TaskState` objects will,...
this is ready to merge. There are a few subtle breaking changes. However, they are so deeply burried that I do not believe a proper deprecation cycle is required. For...
The one test failure is https://github.com/dask/dask/issues/11460
> We would need to set up a lot more rules to only trigger certain workflows on certain file changes which would increase CI complexity even further. It's less clear...
> Sure if the source was completely separate then you could do that, but what value do you get from bringing things together if they are still separate? I guess...
Earlier today we merged the dask-expr repo into dask/dask, see https://github.com/dask/dask/pull/11623 dask/dask now includes the entire commit history of dask-expr. There are still a couple of cleanup tasks to be...
For the record, the dask-expr merge was done by following this blog post https://gfscott.com/blog/merge-git-repos-and-keep-commit-history/ to preserve the git history (probably similar to the SO post that was already recommended above)...
It might be helpful to run `analyze` on a subset of the data to see where things blow up, see https://github.com/dask/dask-expr/blob/1c646712b3c74eb9bed52bec59e442ce24d165c8/dask_expr/_collection.py#L479-L504 (not sure if we have "proper" docs for this)
Yes, this can cause problems if there are many values and the reduction is not able to shrink the data. You can force this to use multiple outputs with `value_counts(...,...
However, if you call `dask.compute` at the very end this will fetch all the data to your computer regardless of the `split_out` paramter which is something you might not want...