ompi
ompi copied to clipboard
MPI_Win_create fails with -np 1
Hi,
I am trying to compile and run an experiment in a simulation framework. I have tried other frameworks where I get no errors, but this uses some of the most recent and advanced MPI routines, so there seem to be some problems. I have asked some of the developers who have installed the same version of openmpi who run the program without errors, so something must be wrong with my installation (and not the simulation framework)
I am using openmpi 4.1.4 - latest stable from https://www.open-mpi.org/software/ompi/v4.1/ I am also using UCX where I have tried both UCX 1.12.1 and 1.13.1 I am using GCC 11.2 which is also set as the default gcc, gfortran and cpp compiler I am using a non-standard configuration of GCC as I compile it with openmp-offload features enabled.
UCX is configured and installed as: $ ./contrib/configure-release --prefix=$PWD/install $ make -j install
openmpi is configured and installed with: $ ./configure FC=gfortran CC=gcc --prefix=$PWD/install --with-ucx=UCX-INSTALL-DIR --enable-mt --enable-mca-no-build=btl-uct $ make -j all $ make install
openmpi is then added as a local module with the file: #%Module 1.0
OpenMPI module for use with 'environment-modules' package:
prepend-path PATH /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/bin prepend-path LD_LIBRARY_PATH /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/lib prepend-path MANPATH /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/share/man setenv MPI_BIN /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/bin setenv MPI_SYSCONFIG /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/etc setenv MPI_INCLUDE /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/include setenv MPI_LIB /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/lib setenv MPI_MAN /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/share/man setenv MPI_COMPILER openmpi-4.1.4 setenv MPI_SUFFIX _openmpi setenv MPI_HOME /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install
I am trying to install it on my workstation, so a single node: Operating System: Red Hat Enterprise Linux CPE OS Name: cpe:/o:redhat:enterprise_linux:7.9:GA:workstation Kernel: Linux 3.10.0-1160.53.1.el7.x86_64 Architecture: x86-64
Details of the problem
The problem occurs when I try to run with a single rank. if I run with multiple ranks (fx. -np 2 ) the code runs without errors
shell$ mpirun -np 1 ./dispatch.x
[canopus:10667] *** An error occurred in MPI_Win_create
[canopus:10667] *** reported by process [3852664833,0]
[canopus:10667] *** on communicator MPI_COMM_WORLD
[canopus:10667] *** MPI_ERR_WIN: invalid window
[canopus:10667] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[canopus:10667] *** and potentially your MPI job)
Thanks in advance
I can reproduce this issue, but only if I am disabling the osc/ucx component (using --mca osc rdma). Are you sure Open MPI is built with UCX support? Can you pass --mca osc ucx to mpirun to force the UCX component to be used? I will look into the osc/rdma issue later.
This seems to be 4.1.x-specific, I cannot reproduce it with 5.0.x or main.
Hi thank you for the reply - I had a talk with some of our in-house developers. The issue was the configuration of UCX. I added --enable-mt to the UCX configure options instead of the openmpi configure options, and that fixed the issue.
Closing
Reopening since this is a real issue in osc/rdma.
It would be nice to have a better error message at the very least.