ompi icon indicating copy to clipboard operation
ompi copied to clipboard

OpenMPI hangs during allocation of shared memory if done after allgather

Open arunjose696 opened this issue 1 year ago • 5 comments

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

I am using v5.0.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed from the below tarball

curl -O https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.0.tar.bz2
tar -jxf openmpi-5.0.0.tar.bz2export PATH=/localdisk/yigoshev/mpi/openmpi-5.0.0-built/bin:$PATH
cd openmpi-5.0.0/
./configure --prefix=<path_to_ompi>
make -j44 all
pip install sphinx_rtd_theme # for some reason openmpi requires this package to install
pip install recommonmark # for some reason openmpi requires this package to install
make -j44 all
make install
export PATH=<path_to_ompi>/bin:$PATH
pip install --no-cache-dir mpi4py

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version:
  • Computer hardware:
  • Network type:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy


Details of the problem

OpenMPI hangs when allocating shared memory on Intel(R) Xeon(R) Platinum 8468, when >= 128 processes are spawned, and an allgather() is present before shared memory allocation.

Further triage: 1)The code works fine for nprocs_to_spawn<128 and occurs only for high number of cpus. 2)The issue occurs when shared memory is allocated after a call to mpi allreduce. If this allreduce is commented issue is not occuring. 3)Issue is absent on other cpus(eg Intel(R) Xeon(R) Platinum 8276L)

import mpi4py
from mpi4py import MPI  # noqa: E402
import sys

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
parent_comm = MPI.Comm.Get_parent()
info = MPI.Info.Create()

if parent_comm == MPI.COMM_NULL:
    nprocs_to_spawn = 128 # everything works on 127 and lower values
    args = ["reproducer.py"]
    
    intercomm = MPI.COMM_SELF.Spawn(
        sys.executable,
        args,
        maxprocs=nprocs_to_spawn,
        info=info,
        root=rank,
    )
    comm = intercomm.Merge(high=False)

if parent_comm != MPI.COMM_NULL:
    comm = parent_comm.Merge(high=True)



#Code works if the below allgather line is commented
ranks = comm.allgather(
             comm.Get_rank()
        )  
win = MPI.Win.Allocate_shared(
            100
            if rank==1
            else 0,
            MPI.BYTE.size,
            comm=comm,
            info=info,
        )

to run

mpiexec -n 1 --oversubscribe python reproducer.py

arunjose696 avatar Nov 23 '23 14:11 arunjose696

@arunjose696 Curious if you have tried 4.1.4/5/6 in addition to 5.0.0? It would be very helpful to determine the impact.

wenduwan avatar Nov 30 '23 21:11 wenduwan

@wenduwan, it also hangs on my side with 4.1.5.

YarShev avatar Nov 30 '23 22:11 YarShev

I'm not sure why it is, but all of these related tests error out for me in the dpm cleanup code. Example with this one:

[st-master][[19684,1],0][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] opal_modex_recv: failed with return value=-46
[st-master][[19684,1],0][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] opal_modex_recv: failed with return value=-46
1 more process has sent help message help-mca-bml-r2.txt / unreachable proc
[st-master][[19684,1],0][btl_tcp_proc.c:400:mca_btl_tcp_proc_create] opal_modex_recv: failed with return value=-46
[st-master:1498089] dpm_disconnect_init: error -12 in isend to process 3
[st-master:1498089] Error in comm_disconnect_waitall
[st-master:1498089:0:1498089] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10)
==== backtrace (tid:1498089) ====
 0 0x000000000007a6e0 ompi_dpm_dyn_finalize()  :0
 1 0x000000000006120c ompi_comm_finalize()  :0
 2 0x000000000002fff8 opal_finalize_cleanup_domain()  :0
 3 0x0000000000025ddc opal_finalize()  :0
 4 0x00000000000912c8 ompi_rte_finalize()  :0
 5 0x0000000000098a10 ompi_mpi_instance_finalize_common()  :0
 6 0x000000000009a320 ompi_mpi_instance_finalize()  :0
 7 0x000000000008d868 ompi_mpi_finalize()  :0
 8 0x00000000000cc6d8 __pyx_f_6mpi4py_3MPI_atexit()  /users/hpritchard/mpi4py_sandbox/mpi4py/src/mpi4py/MPI.c:22520
 9 0x0000000000208970 Py_FinalizeEx()  ???:0
10 0x000000000020a128 Py_Main()  ???:0
11 0x0000000000000d08 main()  ???:0
12 0x0000000000024384 __libc_start_main()  :0
13 0x0000000000000ea0 _start()  ???:0
=================================
--------------------------------------------------------------------------

One thing I noticed is if your Open MPI build happened to find UCX and configure that in, rather than an abort, I''m seeing a hang.

I set

export OMPI_MCA_btl=^uct

and got what I'm reporting above. In previous responses to these test cases I had explicitly disabled ucx support and hence only saw this abort.

The problem appears to be that the dpm cleanup code is assuming all-to-all connectivity during the stage of Open MPI finalization where it was invoked.

hppritcha avatar Dec 01 '23 18:12 hppritcha

The test does not hang for me, nor show this issue with the DPM cleancode in the 4.1.x branch (which is effectively 4.1.6).

hppritcha avatar Dec 04 '23 18:12 hppritcha

I tried with 4.1.6 from conda I could observe the same hang.

Did you try this on a Intel(R) Xeon(R) Platinum 8468 machine, because as mentioned earlier the test code in the issue passes for me on other cpus(eg Intel(R) Xeon(R) Platinum 8276L). Could this be a cpu related issue?

arunjose696 avatar Dec 07 '23 16:12 arunjose696