distributed
distributed copied to clipboard
Repeated calls to `memory_color` take around 12% of CPU time of scheduler
Describe the issue:
Some follow-up to: https://github.com/dask/distributed/issues/8761
After fixing above issue already in https://github.com/dask/distributed/pull/8762, the next big thing that takes very much CPU power with a scheduler with lots of workers (>2000), are the calls to _cluster_memory_color
, more specifically _memory_color
.
https://github.com/dask/distributed/blob/782050a3a4cf2abd450caa8adfaa912c22829e78/distributed/dashboard/components/scheduler.py#L391
As far as I can see, this is about coloring the memory bar of a specific worker depending if it's deemed "good", "almost full" or "full".
Again, speedscope stuff (this was without the fix from PR 8762):
Is this something that could be solved by binning the memory load & size (surely coloring doesn't have to be so exact that is has to be based on exact bytes of memory) and caching the result of this memory coloring process too?
Surely, one don't has to recalculate which color a worker process with for example 1024/4096MiB RAM shall have hundreds of times per second, especially since the coloring result doesn't change at all.
Environment:
- Dask version: 2024.7.0
- Python version: 3.10
- Operating System: Linux, Debian
- Install method (conda, pip, source):
poetry
/pip