ompi
ompi copied to clipboard
Is UCX working with MPI-Sessions?
UCX and MPI-Sessions
When I try to use OpenMPI with USX on our small University-Cluster I got an error message saying that MPI-Session Features are not supported by UCX (The Cluster uses an Infiniband connection). However, when I install it on my Local-Machine (ArchLinux) all seems to work fine. So I'm wondering whether the MPI-Sessions are supported by UCX or not?
Source Code (main.c):
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void function_my_session_errhandler(MPI_Session *foo, int *bar, ...) {
fprintf(stderr, "my error handler called here with error %d\n", *bar);
}
void function_check_print_error(char *format, int rc) {
if (MPI_SUCCESS != rc) {
fprintf(stderr, format, rc);
abort();
}
}
int main(int argc, char *argv[]) {
MPI_Session session;
MPI_Errhandler errhandler;
MPI_Group group;
MPI_Comm comm_world, comm_self;
MPI_Info info;
int rc, npsets, one = 1, sum;
rc = MPI_Session_create_errhandler(function_my_session_errhandler, &errhandler);
function_check_print_error("Error handler creation failed with rc = %d\n", rc);
rc = MPI_Info_create(&info);
function_check_print_error("Info creation failed with rc = %d\n", rc);
rc = MPI_Info_set(info, "thread_level", "MPI_THREAD_MULTIPLE");
function_check_print_error("Info key/val set failed with rc = %d\n", rc);
rc = MPI_Session_init(info, errhandler, &session);
function_check_print_error("Session initialization failed with rc = %d\n", rc);
rc = MPI_Session_get_num_psets(session, MPI_INFO_NULL, &npsets);
function_check_print_error(" with rc = %d\n", rc);
for (int i = 0; i < npsets; i++) {
int psetlen = 0;
char pset_name[256];
MPI_Session_get_nth_pset(session, MPI_INFO_NULL, i, &psetlen, NULL);
MPI_Session_get_nth_pset(session, MPI_INFO_NULL, i, &psetlen, pset_name);
fprintf(stderr, " PSET %d: %s (len: %d)\n", i, pset_name, psetlen);
}
rc = MPI_Group_from_session_pset(session, "mpi://WORLD", &group);
function_check_print_error("Could not get a group for mpi://WORLD. rc = %d\n", rc);
rc = MPI_Comm_create_from_group(group, "my_world", MPI_INFO_NULL, MPI_ERRORS_RETURN, &comm_world);
function_check_print_error("Could not create Communicator my_world. rc = %d\n", rc);
MPI_Group_free(&group);
MPI_Allreduce(&one, &sum, 1, MPI_INT, MPI_SUM, comm_world);
fprintf(stderr, "World Comm Sum (1): %d\n", sum);
rc = MPI_Group_from_session_pset(session, "mpi://SELF", &group);
function_check_print_error("Could not get a group for mpi://SELF. rc = %d\n", rc);
MPI_Comm_create_from_group(group, "myself", MPI_INFO_NULL, MPI_ERRORS_RETURN, &comm_self);
MPI_Group_free(&group);
MPI_Allreduce(&one, &sum, 1, MPI_INT, MPI_SUM, comm_self);
fprintf(stderr, "Self Comm Sum (1): %d\n", sum);
MPI_Errhandler_free(&errhandler);
MPI_Info_free(&info);
MPI_Comm_free(&comm_world);
MPI_Comm_free(&comm_self);
MPI_Session_finalize(&session);
return 0;
}
Commands used to compile and run
mpicc \-o main main.c
mpirun -np 1 -mca osc ucx out/main
Console Output Uni-Cluster
$ mpirun -np 1 -mca pml ucx main
PSET 0: mpi://WORLD (len: 12)
PSET 1: mpi://SELF (len: 11)
PSET 2: mpix://SHARED (len: 14)
Could not create Communicator my_world. rc = 52
[nv46:97180] *** Process received signal ***
[nv46:97180] Signal: Aborted (6)
[nv46:97180] Signal code: (-6)
--------------------------------------------------------------------------
Your application has invoked an MPI function that is not supported in
this environment.
MPI function: MPI_Comm_from_group/MPI_Intercomm_from_groups
Reason: The PML being used - ucx - does not support MPI sessions related features
--------------------------------------------------------------------------
[nv46:97180] [ 0] /usr/lib/libc.so.6(+0x3c770)[0x72422de41770]
[nv46:97180] [ 1] /usr/lib/libc.so.6(+0x8d32c)[0x72422de9232c]
[nv46:97180] [ 2] /usr/lib/libc.so.6(gsignal+0x18)[0x72422de416c8]
[nv46:97180] [ 3] /usr/lib/libc.so.6(abort+0xd7)[0x72422de294b8]
[nv46:97180] [ 4] main(+0x12f4)[0x6239e33802f4]
[nv46:97180] [ 5] main(+0x1585)[0x6239e3380585]
[nv46:97180] [ 6] /usr/lib/libc.so.6(+0x25cd0)[0x72422de2acd0]
[nv46:97180] [ 7] /usr/lib/libc.so.6(__libc_start_main+0x8a)[0x72422de2ad8a]
[nv46:97180] [ 8] main(+0x1165)[0x6239e3380165]
[nv46:97180] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 97180 on node nv46 exited on
signal 6 (Aborted).
Console Output Local:
$ mpirun -np 1 -mca osc ucx main
PSET 0: mpi://WORLD (len: 12)
PSET 1: mpi://SELF (len: 11)
PSET 2: mpix://SHARED (len: 14)
World Comm Sum (1): 1
Self Comm Sum (1): 1
Installation
Small Uni-Cluster
UCX Output
Output von configure-release:
[[
configure: ASAN check: no
configure: Multi-thread: disabled
configure: MPI tests: disabled
configure: VFS support: yes
configure: Devel headers: no
configure: io_demo CUDA support: no
configure: Bindings: < >
configure: UCS modules: < fuse >
configure: UCT modules: < ib rdmacm cma >
configure: CUDA modules: < >
configure: ROCM modules: < >
configure: IB modules: < >
configure: UCM modules: < >
configure: Perf modules: < >
]]
Output make install:
$UCXFOLDER/myinstall/bin/ucx_info -v
[[
# Library version: 1.17.0
# Library path: ${HOME}/itoyori/ucx/myinstall/lib/libucs.so.0
# API headers version: 1.17.0
# Git branch 'master', revision a48ad8f
# Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=${HOME}/itoyori/ucx/myinstall --without-go
]]
OpenMPI
Output von configure:
[[
Open MPI configuration:
-----------------------
Version: 5.0.3
MPI Standard Version: 3.1
Build MPI C bindings: yes
Build MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
Build MPI Java bindings (experimental): no
Build Open SHMEM support: yes
Debug build: no
Platform file: (none)
Miscellaneous
-----------------------
Atomics: GCC built-in style atomics
Fault Tolerance support: mpi
HTML docs and man pages: installing packaged docs
hwloc: external
libevent: external
Open UCC: no
pmix: external
PRRTE: external
Threading Package: pthreads
Transports
-----------------------
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no (not found)
Open UCX: yes
OpenFabrics OFI Libfabric: yes (pkg-config: default search paths)
Portals4: no (not found)
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes
Accelerators
-----------------------
CUDA support: no
ROCm support: no
OMPIO File Systems
-----------------------
DDN Infinite Memory Engine: no
Generic Unix FS: yes
IBM Spectrum Scale/GPFS: no (not found)
Lustre: no (not found)
PVFS2/OrangeFS: no
]]
Local
UCX Output
Output von configure-release:
configure: =========================================================
configure: UCX build configuration:
configure: Build prefix: ${HOME}/ucx/myinstall
configure: Configuration dir: ${prefix}/etc/ucx
configure: Preprocessor flags: -DCPU_FLAGS="" -I${abs_top_srcdir}/src -I${abs_top_builddir} -I${abs_top_builddir}/src
configure: C compiler: gcc -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch -Wno-pointer-sign -Werror-implicit-function-declaration -Wno-format-zero-length -Wnested-externs -Wshadow -Werror=declaration-after-statement
configure: C++ compiler: g++ -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch
configure: Multi-thread: disabled
configure: MPI tests: disabled
configure: VFS support: yes
configure: Devel headers: no
configure: io_demo CUDA support: no
configure: Bindings: < >
configure: UCS modules: < fuse >
configure: UCT modules: < cma >
configure: CUDA modules: < >
configure: ROCM modules: < >
configure: IB modules: < >
configure: UCM modules: < >
configure: Perf modules: < >
configure: =========================================================
Output make install:
$UCXFOLDER/myinstall/bin/ucx_info -v
# Library version: 1.16.0
# Library path: ${HOME}/ucx/myinstall/lib/libucs.so.0
# API headers version: 1.16.0
# Git branch '', revision e4bb802
# Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=${HOME}/ucx/myinstall --without-go
OpenMPI Output
Output von configure:
Open MPI configuration:
-----------------------
Version: 5.0.3
MPI Standard Version: 3.1
Build MPI C bindings: yes
Build MPI Fortran bindings: no
Build MPI Java bindings (experimental): no
Build Open SHMEM support: yes
Debug build: no
Platform file: (none)
Miscellaneous
-----------------------
Atomics: GCC built-in style atomics
Fault Tolerance support: mpi
HTML docs and man pages: installing packaged docs
hwloc: internal
libevent: external
Open UCC: no
pmix: internal
PRRTE: internal
Threading Package: pthreads
Transports
-----------------------
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no (not found)
Open UCX: yes
OpenFabrics OFI Libfabric: no (not found)
Portals4: no (not found)
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes
Accelerators
-----------------------
CUDA support: no
ROCm support: no
OMPIO File Systems
-----------------------
DDN Infinite Memory Engine: no
Generic Unix FS: yes
IBM Spectrum Scale/GPFS: no (not found)
Lustre: no (not found)
PVFS2/OrangeFS: no
MPI and UCX Installation
Ordnerstruktur:
${HOME}/ucx
${HOME}/openmpi-5.0.3
Install OpenUCX
cd ${HOME}
git clone https://github.com/openucx/ucx.git
cd ucx
git checkout v1.16.0
export UCXFOLDER=${HOME}/ucx
./autogen.sh
./contrib/configure-release --prefix=$UCXFOLDER/myinstall --without-go
Install:
make -j32
make install
OpenMPI
cd ${HOME}
wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.3.tar.gz
tar xfvz openmpi-5.0.3.tar.gz
export MPIFOLDER=${HOME}/openmpi-5.0.3
cd $MPIFOLDER
./configure --disable-io-romio --with-io-romio-flags=--without-ze --disable-sphinx --prefix="$MPIFOLDER/myinstall" --with-ucx="$UCXFOLDER/myinstall" 2>&1 | tee config.out
Install:
make -j32 all 2>&1 | tee make.out
make install 2>&1 | tee install.out
export OMPI="${MPIFOLDER}/myinstall"
export PATH=$OMPI/bin:$PATH
export LD_LIBRARY_PATH=$OMPI/lib:$LD_LIBRARY_PATH
So I'm wondering whether the MPI-Sessions are supported by UCX or not?
Yes, that's the case, MPI-Sessions are not supported by UCX.
On the feature list for the next major release.
Thanks for the answers.
Let's keep this open until it's fixed. Other people will probably run into this too.
Is there any mode to execute OpenMPI 5 with sessions?
closed via #12723
No plans currently to push these changes back to v5.0.x branch.