Hui Zhou
Hui Zhou
> On my mac, I was able to link an MPI-IO program without link errors. Interesting. Did you build the MPICH then the standalone ROMIO using the same ROMIO source?...
Reference, `--gpus-per-proc` was added in this commit https://github.com/pmodels/mpich/pull/4862/commits/2aa2a6cdf8bbce92fa3a3023efdb175a1cf2f8bc `--gpus-per-proc` will set the environment variable `CUDA_VISIBLE_DEVICES`. Reference - https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
`-bind-to gpu1` is also supported, reference -- https://github.com/pmodels/mpich/blob/7f8eefd25fe603ddf0e3ef6fdcabfc829a6d8890/src/pm/hydra/tools/topo/hwloc/topo_hwloc.c#L268-L287
Are we looking for options such as `mpiexec -bind-to {cuda1,cuda2,ze1,ze2}` etc.? If `hwloc` supports it, then it is just a matter of adding the name/alias into `topo_hwloc.c`. @yfguo @abrooks98 @zhenggb72...
Nightly tests have been clean for a while. It must be fixed at some point.
@sagarth was able to reproduce the failure independent of xpmem: https://github.com/pmodels/mpich/pull/5375#issuecomment-904988673. The relevant information: > the behavior was the same during the timeouts: few ranks were stuck inside Init_shm_barrier while...
``` mpicc -o end_of_file end_of_file.c && mpirun -n 2 ./end_of_file tmp.txt 0: count was 10; expected 5 0: buffer[5] = 0; expected 99 0: buffer[6] = 0; expected 99 0:...
> **Originally by _robl_ on 2010-08-31 17:01:01 -0500** > > I looked at this for a while and got close but not quite there. > > the easy approach is...
My interpretation is the standard simply points out the collective semantics -- that every process needs call the same function. This is irrelevant to whether implementation ensures collective behavior. Since...
> @hzhou Your interpretation of the standard is correct. But I think @wesbland's point is that, the implementation should be correct even if some processes give a different value than...