DLA-Future
DLA-Future copied to clipboard
Investigate throttling number of active algorithms ("unrolling factor")
Use some type of semaphore to limit the number of algorithms that can be scheduled concurrently instead of "unrolling" the full pipeline in one go. This may improve memory locality (with sequential tasks having higher likelihood of running immediately after each other).