ompi
ompi copied to clipboard
Possible heap-use-after-free reported by AddressSanitizer inside PMPI_Init call
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.1.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
spack installation
Please describe the system on which you are running
- Operating system/version: CentOS 8
- Computer hardware: AMD Epyc 7532 processors (32 cores per CPU, 2.4 GHz)
- Network type: N.A.
Details of the problem
This issue occurs at a machine used by E3SM (e3sm.org) https://e3sm.org/model/running-e3sm/supported-machines/chrysalis-anl
modules used: gcc/9.2.0-ugetvbp openmpi/4.1.3-sxfyy4k
A simple MPI program is built with AddressSanitizer supported by GCC
module load gcc/9.2.0-ugetvbp openmpi/4.1.3-sxfyy4k
cat <<EOF >> test_mpi.c
#include <mpi.h>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
MPI_Finalize();
return 0;
}
EOF
mpicc -fsanitize=address -static-libasan test_mpi.c
A SLURM job is used to run the MPI executable built above via srun. The output shows some errors detected by AddressSanitizer
=================================================================
==268787==ERROR: AddressSanitizer: heap-use-after-free on address 0x61900001f480 at pc 0x000000448c68 bp 0x7fffffffb4e0 sp 0x7fffffffac90
READ of size 2 at 0x61900001f480 thread T0
#0 0x448c67 in __interceptor_strlen /tmp/svcbuilder/spack-stage-gcc-9.2.0-ugetvbp5jl5kgy7jwjloyf73vnhhw7db/spack-src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:354
#1 0x155552d0b9a4 in opal_pmix_base_partial_commit_packed base/pmix_base_fns.c:405
#2 0x155552d0d423 in s2_put /tmp/svcbuilder/spack-stage-openmpi-4.1.3-sxfyy4knvddpewshfcc45heice7tzs7f/spack-src/opal/mca/pmix/s2/pmix_s2.c:548
#3 0x155555029640 in mca_pml_base_pml_selected base/pml_base_select.c:323
#4 0x155555029640 in mca_pml_base_pml_selected base/pml_base_select.c:318
#5 0x155555029c90 in mca_pml_base_select base/pml_base_select.c:284
#6 0x1555550736ac in ompi_mpi_init runtime/ompi_mpi_init.c:647
#7 0x155554ea80de in PMPI_Init /tmp/svcbuilder/spack-stage-openmpi-4.1.3-sxfyy4knvddpewshfcc45heice7tzs7f/spack-src/ompi/mpi/c/profile/pinit.c:67
#8 0x528ced in main (/gpfs/fs1/home/wuda/ASAN/a.out+0x528ced)
#9 0x155553e92492 in __libc_start_main (/usr/lib64/libc.so.6+0x23492)
#10 0x4069bd in _start (/gpfs/fs1/home/wuda/ASAN/a.out+0x4069bd)
0x61900001f480 is located 0 bytes inside of 1036-byte region [0x61900001f480,0x61900001f88c)
freed by thread T0 here:
#0 0x4eb7ce in __interceptor_realloc /tmp/svcbuilder/spack-stage-gcc-9.2.0-ugetvbp5jl5kgy7jwjloyf73vnhhw7db/spack-src/libsanitizer/asan/asan_malloc_linux.cc:163
#1 0x155552d0b994 in opal_pmix_base_partial_commit_packed base/pmix_base_fns.c:404
previously allocated by thread T0 here:
#0 0x4eb5ae in __interceptor_calloc /tmp/svcbuilder/spack-stage-gcc-9.2.0-ugetvbp5jl5kgy7jwjloyf73vnhhw7db/spack-src/libsanitizer/asan/asan_malloc_linux.cc:153
#1 0x155552d0a604 in pmi_encode base/pmix_base_fns.c:705
SUMMARY: AddressSanitizer: heap-use-after-free /tmp/svcbuilder/spack-stage-gcc-9.2.0-ugetvbp5jl5kgy7jwjloyf73vnhhw7db/spack-src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:354 in __interceptor_strlen
Shadow bytes around the buggy address:
0x0c327fffbe40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbe50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbe60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbe70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c327fffbe80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c327fffbe90:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbea0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbeb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbec0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbed0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbee0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==268787==ABORTING
srun: error: chr-0502: task 0: Exited with exit code 1
Comment
Not sure if this issue is still reproducible in latest Open MPI 5.0 as the related function opal_pmix_base_partial_commit_packed shown in the stack trace has been removed by PR #7202
This seems similar to #10415
Which version of PMIx are you using?
Which version of PMIx are you using?
Slurm is reporting pmi2, not PMIx. OpenMPI was built --with-pmi=/usr, not --with-pmix=...
@chrlogin1:chrys$ srun --mpi=list
srun: MPI types are...
srun: none
srun: pmi2
srun: cray_shasta