Hui Zhou
Hui Zhou
test:mpich/ch4/most test:mpich/ch3/most
test:mpich/ch4/most test:mpich/ch3/most 1 failure - `ch4-ucx-external` - `[coll.01376 - ./coll/reduce 10 MPIR_CVAR_REDUCE_POSIX_INTRA_ALGORITHM=release_gather MPIR_CVAR_COLL_SHM_LIMIT_PER_NODE=131072 MPIR_CVAR_REDUCE_INTRANODE_BUFFER_TOTAL_SIZE=32768 MPIR_CVAR_REDUCE_INTRANODE_NUM_CELLS=4 MPIR_CVAR_REDUCE_INTRANODE_TREE_KVAL=8 MPIR_CVAR_REDUCE_INTRANODE_TREE_TYPE=knomial_2](https://jenkins-pmrs.cels.anl.gov/job/mpich-review-ch4-ucx/3401/jenkins_configure=external,label=ubuntu22.04_review/testReport/junit/(root)/coll/01376_____coll_reduce_10__MPIR_CVAR_REDUCE_POSIX_INTRA_ALGORITHM_release_gather_MPIR_CVAR_COLL_SHM_LIMIT_PER_NODE_131072_MPIR_CVAR_REDUCE_INTRANODE_BUFFER_TOTAL_SIZE_32768_MPIR_CVAR_REDUCE_INTRANODE_NUM_CELLS_4_MPIR_CVAR_REDUCE_INTRANODE_TREE_KVAL_8_MPIR_CVAR_REDUCE_INTRANODE_TREE_TYPE_knomial_2/)`
Try get a clean test: test:mpich/ch4/ucx
> > * `MPIR_Subgroup` differs from `MPIR_Group` as the latter does not live inside a communicator, thus overly complex and inefficient to use. > > What is meant by `MPIR_Group`...
test:mpich/ch4/most test:mpich/ch3/most Only 2 timouts in `ch4-ofi-default` due to congestions: ``` datatype.01767 - ./datatype/large_type_sendrec 2 33 coll.00127 - ./coll/gather_big 8 ```
test:mpich/pmi test:mpich/ch4/xpmem
test:mpich/ch4/most test:mpich/ch3/most
@zhenggb72 What is your suggested path forward?
We are delaying this PR to 4.4. To help merging this PR, we'll need more performance measurement to quantify the memory and performance impact. Also get more input to ensure...
💡 Maybe we can simplify this PR if we are able to create *lightweight* communicator from a `coll_group` that shares the `context_id` from parent.