sarus
sarus copied to clipboard
RDMA failed to open device
Hello,
I am trying to run some MPI benchmarks with Sarus containers. In particular I am using OpenMPI 4.
Nodes are RDMA capable and have Infiniband. Everything works fine without the container and if I run ibv_devinfo
on the host I got:
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.26.0206
node_guid: 0015:5dff:fe33:ff0d
sys_image_guid: 506b:4b03:00fb:f03a
vendor_id: 0x02c9
vendor_part_id: 4120
hw_ver: 0x0
board_id: MT_0000000010
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 700
port_lmc: 0x00
link_layer: InfiniBand
But if I run it inside a container I got Failed to open device
. So, I tried to mount the device with a bind but it does not work without sudo:
[user@controller1 ~]$ sarus run --mount=src=/dev/infiniband/uverbs0,dst=/dev/infiniband/uverbs0,type=bind nichr/hpc-bench:v2 bash
[895.208658764] [controller1-5327] [main] [ERROR] Error trace (most nested error last):
#0 createFoldersIfNecessary at "Utility.cpp":437 Failed to create directory "/opt/sarus/1.3.0-Release/var/OCIBundleDir/rootfs/dev/infiniband"
#1 "unknown function" at "unknown file":-1 boost::filesystem::create_directory: Permission denied: "/opt/sarus/1.3.0-Release/var/OCIBundleDir/rootfs/dev/infiniband"
On the other hand, it works with sudo and the device is recognized inside the container.
1. Is there any other way to mount the device without sudo?
The guide reports that I need to use the SSH hook in order to run OpenMPI. But if I launch sarus with sudo, mount and srun:
[user@controller1 sarus]$ srun sudo /opt/sarus/1.3.0-Release/bin/sarus run --ssh --mount=src=/dev/infiniband/uverbs0,dst=/dev/infiniband/uverbs0,type=bind nichr/hpc-bench:v2 bash -c 'if [ $SLURM_PROCID -eq 0 ]; then mpirun -npernode 1 --allow-run-as-root --map-by node -mca pml ucx --mca btl ^vader,tcp,openib -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_IB_PKEY=$UCX_IB_PKEY /opt/benchmarks/mpiBench/mpiBench -e 1K; else sleep infinity; fi'
I got:
bash: line 0: [: -eq: unary operator expected
bash: line 0: [: -eq: unary operator expected
2. If I use OpenMPI I need the SSH hook, am I right?
I have created the container with the following Dockerfile:
FROM centos:7.6.1810
# set up base
RUN yum install -y epel-release \
&& yum groupinstall -y "Development tools" \
&& yum install -y \
libusbx pciutils-libs pciutils lsof ethtool fuse-libs \
ca-certificates wget openssh-server openssh-clients net-tools \
numactl-devel gtk2 atk cairo tcsh libnl3 tcl libmnl tk
# set up workdir
ENV INSTALL_PREFIX=/opt
WORKDIR /tmp/mpi
# download and install mlnx
RUN wget -q -O - http://content.mellanox.com/ofed/MLNX_OFED-5.1-0.6.6.0/MLNX_OFED_LINUX-5.1-0.6.6.0-rhel7.6-x86_64.tgz | tar -xzf - \
&& ./MLNX_OFED_LINUX-5.1-0.6.6.0-rhel7.6-x86_64/mlnxofedinstall --user-space-only --without-fw-update --all --force \
&& rm -rf MLNX_OFED_LINUX-5.1-0.6.6.0-rhel7.6-x86_64
# download and install HPC-X
ENV HPCX_VERSION="v2.7.0"
RUN cd ${INSTALL_PREFIX} && \
wget -q -O - https://azhpcstor.blob.core.windows.net/azhpc-images-store/hpcx-v2.7.0-gcc9.2.0-MLNX_OFED_LINUX-5.1-0.6.6.0-redhat7.6-x86_64.tbz | tar -xjf - \
&& HPCX_PATH=${INSTALL_PREFIX}/hpcx-${HPCX_VERSION}-gcc-MLNX_OFED_LINUX-5.1-0.6.6.0-redhat7.6-x86_64 \
&& HCOLL_PATH=${HPCX_PATH}/hcoll \
&& UCX_PATH=${HPCX_PATH}/ucx
# download and install OpenMPI
ENV OMPI_VERSION="4.0.4"
RUN wget -q -O - https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-${OMPI_VERSION}.tar.gz | tar -xzf - \
&& cd openmpi-${OMPI_VERSION} \
&& ./configure --with-ucx=${UCX_PATH} --with-hcoll=${HCOLL_PATH} --enable-mpirun-prefix-by-default \
&& make -j 8 && make install \
&& cd .. \
&& rm -rf openmpi-${OMPI_VERSION}
# install and setup benchmarks
WORKDIR /opt/benchmarks
# download and install mpiBench
RUN wget -q -O - https://codeload.github.com/LLNL/mpiBench/tar.gz/master | tar -xzf - \
&& mv ./mpiBench-master ./mpiBench \
&& cd mpiBench/ \
&& make
# download and install osu micro benchmarks
RUN wget -q -O - http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.6.3.tar.gz | tar -xzf - \
&& mv ./osu-micro-benchmarks-5.6.3 ./osu-micro-benchmarks \
&& cd osu-micro-benchmarks/ \
&& ./configure CC=mpicc CXX=mpicxx \
&& make \
&& make install
I am new to Sarus and HPC world, thank you for your support!
Hello @NicholasRasi, thank you for opening this issue.
-
Is there any other way to mount the device without sudo? We are looking into this behavior and will let you know more as soon as possible.
-
If I use OpenMPI I need the SSH hook, am I right? The error you are getting is related to the Bash syntax of your command. If I'm understanding things correctly, the
$SLURM_PROCID
variable is not defined and the-eq
operator returns an error because it expects two operands. This happens because ofsudo
, which by default does not preserve environment variables; to do so you should use the-E
option (see for reference the sudo manpage). Also I believe that you are missing the-hostfile
option tompirun
within the container, to inform the launcher of the available hosts. More generally, it is not necessary to use the SSH hook in conjunction with OpenMPI. The cookbook page you are referring to shows how the SSH hook could be used to enable OpenMPI communication, but there are other possibilities. As an example, if you want to run with the MPI stack from the container image, you could leverage the PMI2 process management interface, which Sarus is able to propagate into containers. You may find more information about this approach here.
Hello @Madeeks, thank for your reply.
- Ok, thank you, I look forward to hearing from you soon.
- Yes, you are right I was missing the
-E
option and the host file. By the way, if I launch:
salloc -N 2 --cpus-per-task 60
srun sudo -E /opt/sarus/1.3.0-Release/bin/sarus run --ssh
--mount=src=/home/user,dst=/home/user,type=bind
--mount=src=/dev/infiniband/uverbs0,dst=/dev/infiniband/uverbs0,type=bind
nichr/hpc-bench:v2 echo $SLURM_PROCID
the execution stucks (while it does not without -E
).
On the other hand, I tried to run the following bash script:
#!/bin/bash
#SBATCH --job-name=osu_sarus
#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
#SBATCH --time=00:10:00
#SBATCH --output=res_mpi.txt
#SBATCH --err=err_mpi.txt
#SBATCH --partition=hpc
module purge
module load mpi/openmpi
mpirun --map-by node -mca pml ucx --mca btl ^vader,tcp,openib -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_IB_PKEY=$UCX_IB_PKEY \
sudo /opt/sarus/1.3.0-Release/bin/sarus run \
--mount=src=/dev/infiniband/uverbs0,dst=/dev/infiniband/uverbs0,type=bind \
nichr/hpc-bench:v2 \
/opt/benchmarks/mpiBench/mpiBench -e 1K
The execution completed giving the following result:
$ cat res_mpi.txt
START mpiBench v1.5
0 : worker1
Barrier Bytes: 0 Iters: 1000 Avg: 0.0061 Min: 0.0061 Max: 0.0061 Comm: MPI_COMM_WORLD Ranks: 1
Bcast Bytes: 0 Iters: 1000 Avg: 0.0138 Min: 0.0138 Max: 0.0138 Comm: MPI_COMM_WORLD Ranks: 1
...
Allgatherv Bytes: 1024 Iters: 1000 Avg: 0.0156 Min: 0.0156 Max: 0.0156 Comm: MPI_COMM_WORLD Ranks: 1
START mpiBench v1.5
0 : worker2
Barrier Bytes: 0 Iters: 1000 Avg: 0.0062 Min: 0.0062 Max: 0.0062 Comm: MPI_COMM_WORLD Ranks: 1
Bcast Bytes: 0 Iters: 1000 Avg: 0.0338 Min: 0.0338 Max: 0.0338 Comm: MPI_COMM_WORLD Ranks: 1
Bcast Bytes: 1 Iters: 1000 Avg: 0.0339 Min: 0.0339 Max: 0.0339 Comm: MPI_COMM_WORLD Ranks: 1
...
Reduce Bytes: 1024 Iters: 1000 Avg: 0.0586 Min: 0.0586 Max: 0.0586 Comm: MPI_COMM_WORLD Ranks: 1
Message buffers (KB): 2
END mpiBench
Message buffers (KB): 2
END mpiBench
$ cat err_mpi.txt
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: worker1
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4120
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: worker2
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4120
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: worker1
Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: worker2
Local device: mlx5_0
--------------------------------------------------------------------------
As far as I understand the workers do not communicate.
If I launch the application with srun
and -mpi=pmi2
salloc -N 2 --cpus-per-task 60
srun -N2 --mpi=pmi2 sudo /opt/sarus/1.3.0-Release/bin/sarus run \
--mount=src=/dev/infiniband/uverbs0,dst=/dev/infiniband/uverbs0,type=bind \
nichr/hpc-bench:v2 \
/opt/benchmarks/mpiBench/mpiBench -e 1K
I get a similar result.
I also ran a batch script with MVAPICH2 and the Sarus MPI hook
#!/bin/bash
#SBATCH --job-name=osu_sarus
#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
#SBATCH --time=00:10:00
#SBATCH --output=res_mpi.txt
#SBATCH --err=err_mpi.txt
#SBATCH --partition=hpc
module purge
module load mpi/mvapich2
srun sarus run --mpi \
nichr/hpc-bench:v4 \
/opt/benchmarks/mpiBench/mpiBench -e 1K
I did not get any error but the workers are separated as in the previous result.
On my cluster I have MVAPICH2 2.3.4 while on the guide the recommended version is the MVAPICH2 2.2, do you think it can be a problem? Are the workers separated due to the launch of Sarus with sudo?
Thank you