mpich
mpich copied to clipboard
Sessions: Issues after second init/fini round without MPI_Init
Reproducer
import mpi4py
mpi4py.rc.initialize = False
from mpi4py import MPI
assert not MPI.Is_initialized()
# first round
session = MPI.Session.Init()
session.Finalize()
print('OK')
# second round
session = MPI.Session.Init()
session.Finalize()
print('OK')
Unexpected Behavior
$ valgrind -q python test.py
OK
==1353978== Invalid read of size 1
==1353978== at 0x484C5F4: strcmp (vg_replace_strmem.c:927)
==1353978== by 0x13C74D34: MPII_Coll_init (coll_impl.c:74)
==1353978== by 0x13D5DAF7: MPII_Init_thread (mpir_init.c:180)
==1353978== by 0x13D5B146: MPIR_Session_init_impl (init_impl.c:145)
==1353978== by 0x137BF589: internal_Session_init (session_init.c:83)
==1353978== by 0x137BF6A1: PMPI_Session_init (session_init.c:141)
==1353978== by 0x135302AB: __pyx_pf_6mpi4py_3MPI_7Session_8Init (MPI.c:117714)
==1353978== by 0x1353017C: __pyx_pw_6mpi4py_3MPI_7Session_9Init (MPI.c:117653)
==1353978== by 0x4991160: cfunction_call (methodobject.c:543)
==1353978== by 0x498D1A7: _PyObject_MakeTpCall (call.c:215)
==1353978== by 0x498A5C3: UnknownInlinedFun (abstract.h:112)
==1353978== by 0x498A5C3: UnknownInlinedFun (abstract.h:99)
==1353978== by 0x498A5C3: UnknownInlinedFun (abstract.h:123)
==1353978== by 0x498A5C3: UnknownInlinedFun (ceval.c:5869)
==1353978== by 0x498A5C3: _PyEval_EvalFrameDefault (ceval.c:4181)
==1353978== by 0x49838D2: UnknownInlinedFun (pycore_ceval.h:46)
==1353978== by 0x49838D2: _PyEval_Vector (ceval.c:5065)
==1353978== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1353978==
python:1353978 terminated with signal 11 at PC=484c5f4 SP=1ffeffed98. Backtrace:
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so(_vgr20160ZU_libcZdsoZa_strcmp+0x4)[0x484c5f4]
/home/devel/mpi/mpich/dev/lib/libmpi.so.0(+0x637d35)[0x13c74d35]
/home/devel/mpi/mpich/dev/lib/libmpi.so.0(+0x720af8)[0x13d5daf8]
/home/devel/mpi/mpich/dev/lib/libmpi.so.0(+0x71e147)[0x13d5b147]
/home/devel/mpi/mpich/dev/lib/libmpi.so.0(+0x18258a)[0x137bf58a]
/home/devel/mpi/mpich/dev/lib/libmpi.so.0(MPI_Session_init+0x25)[0x137bf6a2]
/home/dalcinl/.local/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0xd42ac)[0x135302ac]
/home/dalcinl/.local/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0xd417d)[0x1353017d]
/lib64/libpython3.10.so.1.0(+0x11f161)[0x4991161]
/lib64/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x78)[0x498d1a8]
/lib64/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x5f54)[0x498a5c4]
/lib64/libpython3.10.so.1.0(+0x1118d3)[0x49838d3]
/lib64/libpython3.10.so.1.0(PyEval_EvalCode+0x94)[0x49ffe14]
/lib64/libpython3.10.so.1.0(+0x1beca3)[0x4a30ca3]
/lib64/libpython3.10.so.1.0(+0x1ba15a)[0x4a2c15a]
/lib64/libpython3.10.so.1.0(+0x8d316)[0x48ff316]
/lib64/libpython3.10.so.1.0(_PyRun_SimpleFileObject+0x1a9)[0x4a265f9]
/lib64/libpython3.10.so.1.0(_PyRun_AnyFileObject+0x48)[0x4a263b8]
/lib64/libpython3.10.so.1.0(Py_RunMain+0x38c)[0x4a2356c]
/lib64/libpython3.10.so.1.0(Py_BytesMain+0x3b)[0x49efb5b]
/lib64/libc.so.6(+0x29550)[0x4bec550]
/lib64/libc.so.6(__libc_start_main+0x89)[0x4bec609]
python(_start+0x25)[0x109095]
Thanks for reporting. We are waiting for users to report this issue to motivate us to complete the grunt work of fully clean finalizing :). I suspected we have quite a few places where we do not re-initialize the data to a clean state after finalizing.