Florian Jetter comments

Results 288 comments of


                                            Florian Jetter

trafficstars

Mild memory leak in dask workers

> and their output (a bunch of None's) `None` is a singleton. If all tasks are returning `None`, this will not add to the memory blowup. The `TaskState` objects will,...

Use TaskSpec in local dask execution

this is ready to merge. There are a few subtle breaking changes. However, they are so deeply burried that I do not believe a proper deprecation cycle is required. For...

Use TaskSpec in local dask execution

The one test failure is https://github.com/dask/dask/issues/11460

Merge dask and distributed repos?

> We would need to set up a lot more rules to only trigger certain workflows on certain file changes which would increase CI complexity even further. It's less clear...

Merge dask and distributed repos?

> Sure if the source was completely separate then you could do that, but what value do you get from bringing things together if they are still separate? I guess...

Merge dask and distributed repos?

Earlier today we merged the dask-expr repo into dask/dask, see https://github.com/dask/dask/pull/11623 dask/dask now includes the entire commit history of dask-expr. There are still a couple of cleanup tasks to be...

Merge dask and distributed repos?

For the record, the dask-expr merge was done by following this blog post https://gfscott.com/blog/merge-git-repos-and-keep-commit-history/ to preserve the git history (probably similar to the SO post that was already recommended above)...

Out of memory

It might be helpful to run `analyze` on a subset of the data to see where things blow up, see https://github.com/dask/dask-expr/blob/1c646712b3c74eb9bed52bec59e442ce24d165c8/dask_expr/_collection.py#L479-L504 (not sure if we have "proper" docs for this)

Out of memory

Yes, this can cause problems if there are many values and the reduction is not able to shrink the data. You can force this to use multiple outputs with `value_counts(...,...

Out of memory

However, if you call `dask.compute` at the very end this will fetch all the data to your computer regardless of the `split_out` paramter which is something you might not want...