mpich
mpich copied to clipboard
gpu: Disable GPU awareness if no devices are found
A user suggested this for heterogeneous systems with CPU and GPU partitions. Some processes may be restricted from seeing GPU devices, in which case we should gracefully disable the GPU awareness.
See #5054 for the addition of query API for the number of devices detected at initialization time. That would make a useful building block for this functionality.
Currently, that will need to overwrite MPIR_CVAR_ENABLE_GPU
, and it can get messy (modifying cvar at runtime in general). We probably will need a separate global variable, to replace the CVAR usage.
I think overwriting MPIR_CVAR_ENABLE_GPU
should be okay since we are doing it during init before any other GPU functions.