flux-docs icon indicating copy to clipboard operation
flux-docs copied to clipboard

Running jobs: task affinity

Open SteVwonder opened this issue 5 years ago • 3 comments

Flux will automatically set the CPU affinity and set CUDA_VISIBLE_DEVICES based on the cores and GPUs allocated to a job. If you are launching multiple tasks in a job, then you may be interested in the shell options “cpu-affinity” and “gpu-affinity”.

If you launch 2 tasks with flux mini run -n2 -N1 or flux mini run -n2 -N1 -o cpu-affinity=on -o gpu-affinity=on, both tasks/processes will see the same 2 cores and GPUs. If you launch 2 tasks with flux mini run -n2 -o cpu-affinity=per-task -o gpu-affinity=per-task, then each task will only see its own unique core and GPU. If you launch 2 tasks with flux mini run -n2 -o cpu-affinity=off -o gpu-affinity=off, then each task/process will see everything on the entire node.

Note: You can easily test and inspect the effects of various affinity policies using lstopo --restrict binding as the job task (e.g., flux mini run -n2 -N -o cpu-affinity=per-task lstopo --restrict binding).

SteVwonder avatar Jul 22 '20 03:07 SteVwonder

@dongahn and others would it be possible to get this added to the CORAL 1 documentation. Specifically that flux mini run -n2 -o cpu-affinity=per-task -o gpu-affinity=per-task is the equivalent to jsrun -n 2 -r 2 -c 1 -g 1. I once again ran into that slowdown that we were originally seeing back in November for my use case while using the newer version of flux on Summit. The reason is my old flux invocation which looked something like flux mini run -n2 gpu-affinity=per-task was now no longer having each mpi rank see only 1 unique core and gpu. I was able to fix this just by flux mini run -n2 -o cpu-affinity=per-task -o gpu-affinity=per-task instead.

rcarson3 avatar Oct 29 '21 16:10 rcarson3

@rcarson3:

Yes! This is a common gotcha that Flux users on CORAL systems reported. We will document cpu-affinity=per-task -o gpu-affinity=per-task to our CORAL1 section. Sorry that we didn't have this earlier.

dongahn avatar Oct 29 '21 16:10 dongahn

PR #111 is just posted.

dongahn avatar Oct 30 '21 06:10 dongahn