flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

-o gpu-affinity=per-task choosing 'wrong' gpus on tioga

Open ryanday36 opened this issue 3 years ago • 5 comments

see also https://rzlc.llnl.gov/jira/browse/ELCAP-179

The short version of this, I think, is that cpu-affinity and gpu-affinity assign the lowest numbered CPUs and lowest numbered GPUs to the lowest numbered tasks, but on the El Cap hardware, the lowest numbered CPUs are not "closest" (by bandwidth) to the lowest numbered GPUs. The mapping actually looks like:

Processor 0 : GPUs 4,5 Processor 1 : GPUs 2,3 Processor 2 : GPUs 6,7 Processor 3 : GPUs 0,1

whereas '-o cpu-affinity=per-task -o gpu-affinity=per-task' currently gives:

Processor 0 : GPUs 0,1 Processor 1 : GPUs 2,3 Processor 2 : GPUs 4,5 Processor 3 : GPUs 6,7

ryanday36 avatar Sep 27 '22 15:09 ryanday36