Howard Pritchard
Howard Pritchard
although it seems kind of old, this may be related to https://github.com/ofiwg/libfabric/issues/6710
@artpol84 does UCX support 1 and 2 byte atomics now?
closing as no longer being observed by @opoplawski
I believe hcoll is being replaced by UCC so i don't think there's going to be an investement of time in embiggening hcoll unless it already is.
well the suggested code doesn't even compile but I'll see about using the idea ``` make[2]: Entering directory '/usr/projects/artab/users/hpp/ompi/ompi/datatype' CC ompi_datatype_module.lo In file included from ../../ompi/mca/coll/coll.h:85, from ../../ompi/instance/instance.h:22, from ompi_datatype_module.c:42:...
@bosilca okay there's a commit with your suggested change - modified so its compilable and correct. @devreal what do you think of this?
Could you set this env variable in the shell where the parent process is started? export PMIX_MCA_gds=hash and rerun and see it the problem persists?
Using Open MPI main and PMIx at e32e0179 and PRRTE at d02ad07c3d I don't observe this behavior using 3 nodes of a slurm managed cluster. If i use the Open...
well i slightly amend my comment. it seems that if UCX is involved in anyway, Open MPI main with embedded openpmix/prrte hangs. If i configure open mpi with ```--with-ucx=no``` then...
Could you try the 5.0.x nightly tarball? See https://www.open-mpi.org/nightly/v5.0.x/ I'm noticing that with the 5.0.3 release I get a hang with your test but with the current head of 5.0.x...