distributed icon indicating copy to clipboard operation
distributed copied to clipboard

Remove idle and saturated sets from scheduler

Open fjetter opened this issue 4 months ago • 1 comments

This is an attempt at removing the idle/saturated/idle_task_count containers on the scheduler.

  • They are expensive to maintain. They inherently rely on occupancy which is a quantity that cannot be accurately maintained in constant time. We therefore fell back a while ago to decompose occupancy into various quantities that can be maintained online but the occupancy itself has to be computed on demand (scaling linearily with the number of task prefixes). This can be expensive considering how often check_idle_saturated has to be evaluated.
  • Their definition is arguably fragile and hard to grasp. Particularly since they rely on the also quite fragile quantity occupancy which historically has put network load on equal footing to compute load which caused plenty of confusion in the past. It also makes reliable testing notoriously difficult
  • Ultimately, these sets are just maintained for performance reasons. Apart from performance optimization, their only functional purpose is to control the color of the worker bars. Saturated workers (a state that is incredibly difficult to attain) are colored yellow. That's it.

While ripping this out I simplified the stealing code and eradicated a couple of sources of non-determinism

The most important change that is also causing some of the tests to fail is that in the current iteration I chose to define possible thieves on the basis of the number of threads that are available. Previously, all idle classified workers, i.e. all workers with less than half of average occupancy, were considered possible thieves. Therefore, this stealing code is much, much more conservative. That is much more reliable and predictable but is also much less aggressive and cannot enforce absolute homogeneity.

The lack of determinism often originate from the usage of sets or the lack of tie-breakers when sorting. I believe that even without removing the idle/saturated sets it makes sense to remove those sources of non-determinism. Particularly, since this can subtly also affect global task ordering.

I still have to run some actual tests on this. So far this is all rather theoretical

fjetter avatar Oct 11 '24 08:10 fjetter