ompi
ompi copied to clipboard
mpi4py: pm/ucx improbe use-after-free
Running mpi4py's test suite with current main and address sanitizer enabled I get this use-after-free:
testIMProbe (test_p2p_buf_matched.TestP2PMatchedSelf.testIMProbe) ... =================================================================
==1536763==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000fc7b8 at pc 0x7fffea36c0c5 bp 0x7fffffff5970 sp 0x7fffffff5968
READ of size 8 at 0x6150000fc7b8 thread T0
#0 0x7fffea36c0c4 in ompi_request_check_same_instance ../../ompi/request/request.c:274
#1 0x7fffea539411 in PMPI_Waitall ompi/build/ompi/mpi/c/waitall_generated.c:66
#2 0x7fffeb0d1ac5 in __pyx_pf_6mpi4py_3MPI_7Request_28Waitall src/mpi4py/MPI.c:135255
[...]
0x6150000fc7b8 is located 440 bytes inside of 480-byte region [0x6150000fc600,0x6150000fc7e0)
freed by thread T0 here:
#0 0x7ffff77fd288 in __interceptor_free ../../../../libsanitizer/asan/asan_malloc_linux.cpp:52
#1 0x7fffea2d3e49 in ompi_comm_free ../../ompi/communicator/comm.c:2223
#2 0x7fffea3f63dd in PMPI_Comm_free ompi/build/ompi/mpi/c/comm_free_generated.c:62
#3 0x7fffeb046439 in __pyx_pf_6mpi4py_3MPI_4Comm_40Free src/mpi4py/MPI.c:167086
#4 0x7fffeb046439 in __pyx_pw_6mpi4py_3MPI_4Comm_41Free src/mpi4py/MPI.c:167044
previously allocated by thread T0 here:
#0 0x7ffff77fe5bf in __interceptor_malloc ../../../../libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x7fffea2c384e in opal_obj_new ../../opal/class/opal_object.h:495
#2 0x7fffea2c34af in opal_obj_new_debug ../../opal/class/opal_object.h:256
#3 0x7fffea2c516b in ompi_comm_set_nb ../../ompi/communicator/comm.c:220
#4 0x7fffea2c4ea3 in ompi_comm_set ../../ompi/communicator/comm.c:174
#5 0x7fffea2cc6a8 in ompi_comm_dup_with_info ../../ompi/communicator/comm.c:1339
#6 0x7fffea2cc413 in ompi_comm_dup ../../ompi/communicator/comm.c:1320
#7 0x7fffea3f4538 in PMPI_Comm_dup ompi/build/ompi/mpi/c/comm_dup_generated.c:75
#8 0x7fffeb07c0a6 in __pyx_pf_6mpi4py_3MPI_4Comm_26Dup src/mpi4py/MPI.c:165264
#9 0x7fffeb07c0a6 in __pyx_pw_6mpi4py_3MPI_4Comm_27Dup src/mpi4py/MPI.c:165174
I assume it's because the improbe in pml/ucx does not retain the communicator and so when we access the communicator in MPI_Waitall to check the instance it has been released already. If I disable pml/ucx via OMPI_MCA_pml=^ucx the test passes.