ompi icon indicating copy to clipboard operation
ompi copied to clipboard

Try to install mpi4py+openmpi using conda on Ubuntu 20.04 but failed

Open shuheng-mo opened this issue 2 years ago • 4 comments

Thank you for taking the time to submit an issue!

Background information

Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Install using conda create -n ENV_NAME -c conda-forge 'python=3.10.*' openmpi mpi4py

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

as stated above

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version:
  • Computer hardware:
  • Network type:

For hardware details:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          240
On-line CPU(s) list:             0-239
Thread(s) per core:              2
Core(s) per socket:              60
Socket(s):                       2
NUMA node(s):                    8
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7742 64-Core Processor
Stepping:                        0
CPU MHz:                         2245.780
BogoMIPS:                        4491.56
Virtualization:                  AMD-V
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       7.5 MiB
L1i cache:                       7.5 MiB
L2 cache:                        60 MiB
L3 cache:                        128 MiB
NUMA node0 CPU(s):               0-29
NUMA node1 CPU(s):               30-59
NUMA node2 CPU(s):               60-89
NUMA node3 CPU(s):               90-119
NUMA node4 CPU(s):               120-149
NUMA node5 CPU(s):               150-179
NUMA node6 CPU(s):               180-209
NUMA node7 CPU(s):               210-239
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
                                 ia prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
                                  pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_
                                 FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
483G

Details of the problem

activate conda environment and run mpiexec and mpirun both returns

--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS

Tried to set LD_PRELOAD to corresponding libmpi.so in conda /root/anaconda3/envs/mpi-test/lib/libmpi.so or export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so, not working for both cases

shuheng-mo avatar Aug 30 '22 13:08 shuheng-mo

Are you running a MPI program or a python mpi4py script?

If the former, did you build your program within the conda environment?

From the conda environment, what does type mpiexec and ompi_info --all report?

ggouaillardet avatar Aug 30 '22 13:08 ggouaillardet

@ggouaillardet Thanks for your help. Tried to install with pip in CLI and success. I am running a .py script. Not so sure if conda caused this problem so I decide not to use conda now. You may keep this issue and see if can re-produce this issue on Ubuntu 20.04.3 LTS , installed with conda create -n ENV_NAME -c conda-forge 'python=3.10.*' openmpi mpi4py, activate the env created, set flags allow run as root ,then run mpiexec or mpirun should give out the issue. If run like mpirun -np or mpiexec -n should return n , np not found.

shuheng-mo avatar Aug 30 '22 17:08 shuheng-mo

FWIW, I tried and it works for me (I used miniconda from wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh)

How did you install conda?

ggouaillardet avatar Aug 31 '22 02:08 ggouaillardet

@ggouaillardet I installed using Anaconda https://www.anaconda.com/products/distribution. I will try miniconda see if it works

shuheng-mo avatar Aug 31 '22 10:08 shuheng-mo