hydra: Add assignment option for GPU
We could do something similar like SLURM did with CUDA https://slurm.schedmd.com/gres.html#GPU_Management.
Also need to investigate the assignment approach for AMD and Intel GPUs.
Reference, --gpus-per-proc was added in this commit https://github.com/pmodels/mpich/pull/4862/commits/2aa2a6cdf8bbce92fa3a3023efdb175a1cf2f8bc
--gpus-per-proc will set the environment variable CUDA_VISIBLE_DEVICES. Reference - https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
-bind-to gpu1 is also supported, reference --
https://github.com/pmodels/mpich/blob/7f8eefd25fe603ddf0e3ef6fdcabfc829a6d8890/src/pm/hydra/tools/topo/hwloc/topo_hwloc.c#L268-L287
Are we looking for options such as mpiexec -bind-to {cuda1,cuda2,ze1,ze2} etc.? If hwloc supports it, then it is just a matter of adding the name/alias into topo_hwloc.c. @yfguo @abrooks98 @zhenggb72 , can you confirm?
Yes, we are looking for options similar to the -bind-to socket, but in this case to setup the affinity masks in the ranks based on the mapping of ranks to sockets and the GPUs connected to each socket. I do not think that the flats need to specify anything about the GPU type itself, although underneath, we will need to discover the type of GPUs that we have
BTW, support for Level Zero should be in the master branch or HWLOC 2.5.
This issue is addressed by https://github.com/pmodels/mpich/pull/5870 (at least it was supposed to). If some of the details are still missing, either reopen with specific details or open a new issue.