mpich icon indicating copy to clipboard operation
mpich copied to clipboard

bug: crashes with nonblocking collectives and isend/irecv

Open mpichbot opened this issue 9 years ago • 2 comments

Originally by robl on 2016-10-04 16:34:12 -0500


The code HXHIM (formerly known as MDHIM) sometimes (at our urging) tries to use MPI to communicate between entities. It does not go well.

That is, we implemented a simple MDHIM rpc loop in MPI and MARGO in a child thread and in main thread tested a bunch of MPI calls. We ensured that we found spots where the MPI child thread interfered with the main thread. And then we re-implemented the RPC stuff with MARGO [an HPC-oritented RPC framwork based on Mercury and Argobots] and made sure that worked. > It did!

in MPICH the implementation crashes on any collective combined with MPI_isend/irecv.

mpichbot avatar Oct 14 '16 19:10 mpichbot

Originally by robl on 2016-10-04 16:38:02 -0500


Attachment added: margo_mpi_test[1].tgz (8.0 KiB) test case for RPC-oriented workload

mpichbot avatar Oct 14 '16 19:10 mpichbot

The attached test case targets an older version of margo/mercury. I'll have to update it to our latest API

roblatham00 avatar Feb 27 '18 16:02 roblatham00