Hui Zhou
Hui Zhou
In `mpir_pmi.c`, the `put_ex/get_ex` is essentially `put_binary/get_binary`, so yes, we should directly use PMIx binary put/get.
test:mpich/pmi 1 failure with `ch4-ofi-pmi2`: ``` summary_junit_xml.1276 - ./coll/p_red 5 MPIR_CVAR_IREDUCE_DEVICE_COLLECTIVE=0 MPIR_CVAR_IREDUCE_INTRA_ALGORITHM=tsp_tree MPIR_CVAR_IREDUCE_TREE_TYPE=kary MPIR_CVAR_IREDUCE_TREE_KVAL=3 MPIR_CVAR_IREDUCE_TREE_PIPELINE_CHUNK_SIZE=4096 Failing for the past 2 builds (Since Unstable[#122](https://jenkins-pmrs.cels.anl.gov/job/mpich-review-pmi/compiler=gnu,jenkins_configure=pmi2,label=centos64_review,netmod=ch4-ofi/122/) ) [Took 10 sec.](https://jenkins-pmrs.cels.anl.gov/job/mpich-review-pmi/123/compiler=gnu,jenkins_configure=pmi2,label=centos64_review,netmod=ch4-ofi/testReport/(root)/summary_junit_xml/1276_____coll_p_red_5__MPIR_CVAR_IREDUCE_DEVICE_COLLECTIVE_0_MPIR_CVAR_IREDUCE_INTRA_ALGORITHM_tsp_tree_MPIR_CVAR_IREDUCE_TREE_TYPE_kary_MPIR_CVAR_IREDUCE_TREE_KVAL_3_MPIR_CVAR_IREDUCE_TREE_PIPELINE_CHUNK_SIZE_4096/history) Error Message not...
test:mpich/pmi ``` compiler=gnu,jenkins_configure=pmi2,label=centos64_review,netmod=ch4-ofi summary_junit_xml.2220 - ./spawn/spawn1 1 | 3 min 0 sec | 1 summary_junit_xml.2221 - ./spawn/spawn2 1 | 3 min 0 sec | 1 summary_junit_xml.2222 - ./spawn/spawninfo1 1 |...
test:mpich/pmi
test:mpich/pmi test:mpich/ch3/most test:mpich/ch4/most All clear ✔️
test:mpich/pmi
What is your use case that this is an issue?
Try this patch -- https://github.com/pmodels/mpich/pull/5900 -- and see if it fixes the assertion error. That patch only prevents such assertion error in hydra. Last time I checked, I didn't encounter...
OK, I'll investigate and see to add some basic checks for ch3. Note that device ch3 are legacy device and only will receive minimum maintenance. If deploying new MPICH is...
> Can you give me some pointer to "best" or standard practice, if MPI is the wrong layer ? I would suggest preventing accessibility to your cluster from external internet...