Dagger.jl icon indicating copy to clipboard operation
Dagger.jl copied to clipboard

Detect network structure for improved scheduling and parallelism

Open jpsamaroo opened this issue 5 years ago • 0 comments

Modern clusters (especially supercomputers) usually have more than one networking device between a given pair of nodes, and specialized network fabrics (e.g. Infiniband, NVLINK, ROCmRDMA, etc.) which can provide optimized transfers between processors when utilized. Our current processor infrastructure only represents (combined) memory and compute domains, but does not model the connections between and within domains. If we were able to query the network/memory connections between processors, and then transfer data between processors using only a specifically-selected connection, we could enable the scheduler to make much more efficient decisions about where to place work to optimize data transfer latencies and decrease network bottlenecking.

jpsamaroo avatar May 06 '20 17:05 jpsamaroo