ompi icon indicating copy to clipboard operation
ompi copied to clipboard

MPI_Win_create fails with -np 1

Open ximtec opened this issue 3 years ago • 5 comments

Hi,

I am trying to compile and run an experiment in a simulation framework. I have tried other frameworks where I get no errors, but this uses some of the most recent and advanced MPI routines, so there seem to be some problems. I have asked some of the developers who have installed the same version of openmpi who run the program without errors, so something must be wrong with my installation (and not the simulation framework)

I am using openmpi 4.1.4 - latest stable from https://www.open-mpi.org/software/ompi/v4.1/ I am also using UCX where I have tried both UCX 1.12.1 and 1.13.1 I am using GCC 11.2 which is also set as the default gcc, gfortran and cpp compiler I am using a non-standard configuration of GCC as I compile it with openmp-offload features enabled.

UCX is configured and installed as: $ ./contrib/configure-release --prefix=$PWD/install $ make -j install

openmpi is configured and installed with: $ ./configure FC=gfortran CC=gcc --prefix=$PWD/install --with-ucx=UCX-INSTALL-DIR --enable-mt --enable-mca-no-build=btl-uct $ make -j all $ make install

openmpi is then added as a local module with the file: #%Module 1.0

OpenMPI module for use with 'environment-modules' package:

prepend-path PATH /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/bin prepend-path LD_LIBRARY_PATH /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/lib prepend-path MANPATH /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/share/man setenv MPI_BIN /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/bin setenv MPI_SYSCONFIG /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/etc setenv MPI_INCLUDE /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/include setenv MPI_LIB /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/lib setenv MPI_MAN /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install/share/man setenv MPI_COMPILER openmpi-4.1.4 setenv MPI_SUFFIX _openmpi setenv MPI_HOME /mn/stornext/u3/michhaa/from_source/openmpi-4.1.4/install

I am trying to install it on my workstation, so a single node: Operating System: Red Hat Enterprise Linux CPE OS Name: cpe:/o:redhat:enterprise_linux:7.9:GA:workstation Kernel: Linux 3.10.0-1160.53.1.el7.x86_64 Architecture: x86-64


Details of the problem

The problem occurs when I try to run with a single rank. if I run with multiple ranks (fx. -np 2 ) the code runs without errors

shell$ mpirun -np 1 ./dispatch.x

[canopus:10667] *** An error occurred in MPI_Win_create
[canopus:10667] *** reported by process [3852664833,0]
[canopus:10667] *** on communicator MPI_COMM_WORLD
[canopus:10667] *** MPI_ERR_WIN: invalid window
[canopus:10667] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[canopus:10667] ***    and potentially your MPI job)

Thanks in advance

ximtec avatar Aug 29 '22 07:08 ximtec

I can reproduce this issue, but only if I am disabling the osc/ucx component (using --mca osc rdma). Are you sure Open MPI is built with UCX support? Can you pass --mca osc ucx to mpirun to force the UCX component to be used? I will look into the osc/rdma issue later.

devreal avatar Aug 29 '22 13:08 devreal

This seems to be 4.1.x-specific, I cannot reproduce it with 5.0.x or main.

devreal avatar Aug 29 '22 13:08 devreal

Hi thank you for the reply - I had a talk with some of our in-house developers. The issue was the configuration of UCX. I added --enable-mt to the UCX configure options instead of the openmpi configure options, and that fixed the issue.

Closing

ximtec avatar Aug 30 '22 05:08 ximtec

Reopening since this is a real issue in osc/rdma.

devreal avatar Aug 30 '22 11:08 devreal

It would be nice to have a better error message at the very least.

gpaulsen avatar Aug 30 '22 14:08 gpaulsen