distributed
distributed copied to clipboard
Warn (and eventually raise) when client.scatter is used with Active Memory Manager enabled
Using scatter is generally not a good idea anymore and doesn't have any effect if the active memory manager is enabled. People are frequently running into this or are using scatter anyway, which is a bad UX and confusing.
We should raise a warning if scatter is used and point people to delayed and probably raise at some point later or get rid of the method completely.
Using scatter is generally not a good idea anymore and doesn't have any effect if the active memory manager is enabled.
That's only partially true. What doesn't have an effect any more is scatter(..., broadcast=True
Good point, I mixed that up....
Is delayed generally better or is that incorrect?
Is delayed generally better or is that incorrect?
In 9 out of 10 times it is better. The difference between the two approaches is that scatter can take a direct path to the worker instead of proxying through the scheduler. At least if the network configuration allows such things.
Even if scatter proxies over the scheduler, the scheduler just forwards the data directly and doesn't store a copy. This matters if the data is actually large since the scheduler has to hold the delayed task in memory until it is completed. This also means that delayed is more robust to failures, of course.
In the end its a tradeoff between slightly better performance and resilience+higher memory usage on the scheduler.
The safe but slightly more costly approach is delayed. Most end users will likely not be able to differentiate this properly and judge the risks/costs properly so the recommendation to use delayed (or client.submit) is certainly good.
Hi! I guess that this at least should be documented somewhere. The last message from @fjetter really explains things.
There are many places in the doc where scatter is suggested, but I understand that Delaying an object is better and safer in most cases, right?
https://distributed.dask.org/en/stable/locality.html?highlight=scatter https://distributed.dask.org/en/stable/api.html?highlight=scatter#distributed.Client.scatter https://distributed.dask.org/en/stable/memory.html?highlight=scatter https://distributed.dask.org/en/stable/resilience.html?highlight=scatter