distributed icon indicating copy to clipboard operation
distributed copied to clipboard

Repeated calls to `memory_color` take around 12% of CPU time of scheduler

Open jonded94 opened this issue 7 months ago • 6 comments

Describe the issue:

Some follow-up to: https://github.com/dask/distributed/issues/8761

After fixing above issue already in https://github.com/dask/distributed/pull/8762, the next big thing that takes very much CPU power with a scheduler with lots of workers (>2000), are the calls to _cluster_memory_color, more specifically _memory_color.

https://github.com/dask/distributed/blob/782050a3a4cf2abd450caa8adfaa912c22829e78/distributed/dashboard/components/scheduler.py#L391

As far as I can see, this is about coloring the memory bar of a specific worker depending if it's deemed "good", "almost full" or "full".

Again, speedscope stuff (this was without the fix from PR 8762):

image

speedscope.json

Is this something that could be solved by binning the memory load & size (surely coloring doesn't have to be so exact that is has to be based on exact bytes of memory) and caching the result of this memory coloring process too?

Surely, one don't has to recalculate which color a worker process with for example 1024/4096MiB RAM shall have hundreds of times per second, especially since the coloring result doesn't change at all.

Environment:

  • Dask version: 2024.7.0
  • Python version: 3.10
  • Operating System: Linux, Debian
  • Install method (conda, pip, source): poetry / pip

jonded94 avatar Jul 11 '24 17:07 jonded94