WEIS
WEIS copied to clipboard
Open MPI dependency broken (temporary resolution available)
Description
Open MPI creates segmentation faults on Linux machines.
Steps to reproduce issue
-
Follow the WEIS develop branch installation instruction, then mpi=1.0-openmpi version is installed along with mpi4py=3.1.6 version.
-
Run any MPI job, it will fail. More (generalized) information found on this page
-
Temporary resolution to this issue is to install
mpichversion ofmpiinstead ofopenmpi. When installing the WEIS, install specific build of mpi:
conda install -y petsc4py mpi4py mpi=1.0=mpich pyoptsparse # (Mac / Linux only)
instead of running
conda install -y petsc4py mpi4py pyoptsparse # (Mac / Linux only)
to forcefully install mpich version of mpi.
Current behavior
If installed without specifying mpich then:
[log02:599094:0:599094] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x3ff00000001)
[log02:599097:0:599097] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x3ff00000001)
[log02:599095:0:599095] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x3ff00000001)
[log02:599096:0:599096] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x3ff00000001)
Expected behavior
Code versions
At this moment, the issue can be temporarily circumvented by specifying mpich as described above. I hope the dependency issue can be resolved from the conda-forge feedstock level soon.
@yonghoonlee, is this happening on Kestrel or another linux machine?
@dzalkind Yes, it happened in all my Linux machines as well as HPC systems I am currently using, including Kestrel and UofM HPC. When you specify mpich then it works fine. Otherwise, conda automatically select MPI dependencies, and it could be either openmpi or mpich. If openmpi is automatically selected, then the problem persists.
There are two workarounds I found (and tested) based on the discussions I had with mpi4py and openmpi communities:
- Use mpich instead of openmpi: Install mpich along with mpi4py. Then openmpi will not be installed.
- Install ucx along with openmpi: It seems that certain version of ucx installed on many linux distributions (both Debian and RedHat based distros) create issue with certain version of openmpi. Install ucx along with mpi4py, then openmpi installed with mpi4py will work fine with the most up-to-date version of ucx.
Solution from @yonghoonlee
conda install -y petsc4py mpi4py mpich pyoptsparse # (Mac / Linux only)
(also install mpich)