Wei Zhang
Wei Zhang
@wckzhang is on vacation. @shijin-aws can you take a look at this?
There seems to be multiple issues. The following PR fixed one: https://github.com/open-mpi/ompi/pull/10462 with this PR, `1sided` pass.
Another PR https://github.com/open-mpi/ompi/pull/10463 This fixed the segfault of `pp_1sided` and `halo_1sided_put_alloc_mem`
The hang with `c_accumulate` with efa turns out to be a bug in libfabric EFA installer. Fix is in https://github.com/ofiwg/libfabric/pull/7829. It will take a while for mtt to ingest the...
Remaining issue are: 1. c_put_dynamic_self/c_get_dynamic_set always hangs, even for 2 ranks. 2. When btl/tcp is used, there are segfaults with `c_get_accumulate_ddt1` and `c_get_accumulate_ddt2` 3. When btl/tcp is used, `c_accumulate` is...
`c_put_dynamic_self/c_get_dynamic_self` hang will be fixed by PR https://github.com/open-mpi/ompi/pull/10473
Remaining issues: With btl/ofi, `mt_1sided` segfault. With btl/tcp, 1. multiple tests (`1sided`, `c_accumulate`, etc) hang. 2. `c_get_accumulate_ddt1` and `c_get_accumulate_ddt2` segfault.
Hi @chenshixinnb, I have a few questions: 1. What version of open mpi are you using? 2. What version of libfabric are you using (you can get the info by...
Can you add `--mca mtl_base_verbose 100` to your `mpirun` command line and share the output?
Looks like the initialization of ofi (libfabric) failed. ``` [c-96-4-worker0001:03645] select: init returned failure for component ofi [c-96-4-worker0001:03645] select: no component selected [c-96-4-worker0001:03645] select: init returned failure for component cm...