Dagger.jl
Dagger.jl copied to clipboard
Determine work assignment and data movement based on runtime-collected metrics
While round-robin work assigment is fine when first launching work without any prior knowledge, it is less efficient when individual work items are widely varying in duration, and when data being moved varies in size. We have the necessary infrastructure already built-in to Dagger to support monitoring work and data movement latencies, we just need to tell the scheduler how to use this information for its benefit.
I believe that we could use a simple runtime-derived cost model, plus a numerical optimizer, to allow the scheduler to make better decisions. We can also add information about processor hierarchies to further refine the model to capture latencies due to memory transfer between levels of the processor hierarchy (e.g. NUMA domains, CPU-GPU transfers, disk-backed access latency, etc.).
Recent work by @stevengj might be useful here: https://arxiv.org/abs/2003.04287
This was implemented a while back, so closing.