ompi icon indicating copy to clipboard operation
ompi copied to clipboard

mpi4py: Memory corruption with pack/unpack external and MPI_C_LONG_DOUBLE_COMPLEX on macOS arm64

Open dalcinl opened this issue 1 year ago • 0 comments

This is with Open MPI 5.0.1 installed via Homebrew in macOS 14.3 arm64 (Apple Silicon).

$ python test/test_pack.py -k testPackUnpackExternal
[[email protected]] Python 3.11.7 (/opt/homebrew/opt/[email protected]/bin/python3.11)
[[email protected]] numpy 1.24.3 (/opt/homebrew/lib/python3.11/site-packages/numpy)
[[email protected]] MPI 3.1 (Open MPI 5.0.1)
[[email protected]] mpi4py 4.0.0.dev0 (/Users/dalcinl/Devel/mpi4py/src/mpi4py)
FPython(44412,0x1d8751c40) malloc: Heap corruption detected, free list is damaged at 0x600003f02380
*** Incorrect guard value: 0
Python(44412,0x1d8751c40) malloc: *** set a breakpoint in malloc_error_break to debug
[kw-18709:44412] *** Process received signal ***
[kw-18709:44412] Signal: Abort trap: 6 (6)
[kw-18709:44412] Signal code:  (0)
[kw-18709:44412] [ 0] 0   libsystem_platform.dylib            0x0000000181eb5a24 _sigtramp + 56
[kw-18709:44412] [ 1] 0   libsystem_pthread.dylib             0x0000000181e85cc0 pthread_kill + 288
[kw-18709:44412] [ 2] 0   libsystem_c.dylib                   0x0000000181d91a40 abort + 180
[kw-18709:44412] [ 3] 0   libsystem_malloc.dylib              0x0000000181ca8b08 malloc_vreport + 908
[kw-18709:44412] [ 4] 0   libsystem_malloc.dylib              0x0000000181cc824c malloc_zone_error + 104
[kw-18709:44412] [ 5] 0   libsystem_malloc.dylib              0x0000000181cba094 nanov2_guard_corruption_detected + 44
[kw-18709:44412] [ 6] 0   libsystem_malloc.dylib              0x0000000181cb92a8 _nanov2_free + 0
[kw-18709:44412] [ 7] 0   _multiarray_umath.cpython-311-darwi 0x000000010840a274 default_calloc + 160
[kw-18709:44412] [ 8] 0   _multiarray_umath.cpython-311-darwi 0x000000010840a3e4 PyDataMem_UserNEW_ZEROED + 68
[kw-18709:44412] [ 9] 0   _multiarray_umath.cpython-311-darwi 0x000000010845354c PyArray_NewFromDescr_int + 1992
[kw-18709:44412] [10] 0   _multiarray_umath.cpython-311-darwi 0x00000001084567a4 PyArray_Zeros + 96
[kw-18709:44412] [11] 0   _multiarray_umath.cpython-311-darwi 0x00000001084ddf28 array_zeros + 332
[kw-18709:44412] [12] 0   Python                              0x0000000104c69684 _PyEval_EvalFrameDefault + 46868
[kw-18709:44412] [13] 0   Python                              0x0000000104c6cb34 _PyEval_Vector + 116
[kw-18709:44412] [14] 0   Python                              0x0000000104b8a6b4 _PyObject_FastCallDictTstate + 96
[kw-18709:44412] [15] 0   Python                              0x0000000104bf4618 slot_tp_init + 188
[kw-18709:44412] [16] 0   Python                              0x0000000104becac4 type_call + 136
[kw-18709:44412] [17] 0   Python                              0x0000000104b8a408 _PyObject_MakeTpCall + 128
[kw-18709:44412] [18] 0   Python                              0x0000000104c684b4 _PyEval_EvalFrameDefault + 42308
[kw-18709:44412] [19] 0   Python                              0x0000000104c6cb34 _PyEval_Vector + 116
[kw-18709:44412] [20] 0   Python                              0x0000000104b8d810 method_vectorcall + 168
[kw-18709:44412] [21] 0   Python                              0x0000000104c6a2c8 _PyEval_EvalFrameDefault + 50008
[kw-18709:44412] [22] 0   Python                              0x0000000104c6cb34 _PyEval_Vector + 116
[kw-18709:44412] [23] 0   Python                              0x0000000104b8a6b4 _PyObject_FastCallDictTstate + 96
[kw-18709:44412] [24] 0   Python                              0x0000000104bf343c slot_tp_call + 188
[kw-18709:44412] [25] 0   Python                              0x0000000104b8a408 _PyObject_MakeTpCall + 128
[kw-18709:44412] [26] 0   Python                              0x0000000104c684b4 _PyEval_EvalFrameDefault + 42308
[kw-18709:44412] [27] 0   Python                              0x0000000104c6cb34 _PyEval_Vector + 116
[kw-18709:44412] [28] 0   Python                              0x0000000104b8d810 method_vectorcall + 168
[kw-18709:44412] [29] 0   Python                              0x0000000104c6a2c8 _PyEval_EvalFrameDefault + 50008
[kw-18709:44412] *** End of error message ***
zsh: abort      python test/test_pack.py -k testPackUnpackExternal

NOTE: I pushed the following commit to mpi4py@master to skip the test as know failure. You need to git revert f396b23 it to reproduce:

commit f396b23da3055ed96146775aa439ad73906569e9 (HEAD -> master, origin/master, origin/HEAD)
Author: Lisandro Dalcin <[email protected]>
Date:   Sun Feb 4 18:10:36 2024 +0300

    test: Disable datatype=G pack/unpack openmpi macOS arm64

diff --git a/test/test_pack.py b/test/test_pack.py
index fb33f139..c683aa4e 100644
--- a/test/test_pack.py
+++ b/test/test_pack.py
@@ -1,7 +1,7 @@
 from mpi4py import MPI
 import mpiunittest as unittest
 import arrayimpl
-import os
+import os, sys, platform
 
 
 def allclose(a, b, rtol=1.e-5, atol=1.e-8):
@@ -143,6 +143,8 @@ if name == 'MPICH':
 elif name == 'Open MPI':
     if version < (5, 0, 0):
         BaseTestPackExternal.skipdtype += 'gG'
+    if (platform.system(), platform.machine()) == ('Darwin', 'arm64'):
+        BaseTestPackExternal.skipdtype += 'G'
 elif name == 'Intel MPI':
     BaseTestPackExternal.skipdtype += 'lgLG'
     BaseTestPackExternal.skipdtype += 'D'

dalcinl avatar Feb 04 '24 15:02 dalcinl