mpich icon indicating copy to clipboard operation
mpich copied to clipboard

gpu: Disable GPU awareness if no devices are found

Open raffenet opened this issue 3 years ago • 2 comments

A user suggested this for heterogeneous systems with CPU and GPU partitions. Some processes may be restricted from seeing GPU devices, in which case we should gracefully disable the GPU awareness.

raffenet avatar Mar 31 '21 18:03 raffenet

See #5054 for the addition of query API for the number of devices detected at initialization time. That would make a useful building block for this functionality.

raffenet avatar Mar 31 '21 20:03 raffenet

Currently, that will need to overwrite MPIR_CVAR_ENABLE_GPU, and it can get messy (modifying cvar at runtime in general). We probably will need a separate global variable, to replace the CVAR usage.

hzhou avatar Mar 31 '21 21:03 hzhou

I think overwriting MPIR_CVAR_ENABLE_GPU should be okay since we are doing it during init before any other GPU functions.

hzhou avatar Aug 13 '22 18:08 hzhou