software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

failing OpenMPI test when NFS mounts leak into build container

Open boegel opened this issue 3 years ago • 4 comments

FAIL: opal_path_nfs
===================

 Failure : Mismatch: input "/vscmnt/leuven_icts", expected:0 got:1

 Failure : Mismatch: input "/vscmnt/brussel_hydra_home", expected:0 got:1

This looks very much like https://github.com/open-mpi/ompi/issues/318

I was able to work around this by binding /vscmnt to an empty directory when starting the build container, by adding /tmp/vscmnt:/vscmnt to SINGULARITY_BIND in start_build_node_env.sh, but we should find a better/generic solution for this.

Should we somehow try to ignore the system-wide Singularity configuration file (/etc/singularity/singularity.conf), since that's where the bind of /vscmnt (where these NFS mounts are located) is specified?

The singularity command has a --config option to specify an alternate configuration file, but using that requires being root...

boegel avatar Dec 15 '21 08:12 boegel

Also worth mentioning: this problem this not occur with the 2021.06 pilot version, even though the same HPC-UGent system (our Skylake cluster) was used. The Singularity configuration hasn't changed since then (Aug'21), as far as I can tell...

boegel avatar Dec 15 '21 08:12 boegel

Singularity also has a --no-mount flag, but then you still need to specify which mounts have to be disabled, so that's a bit cumbersome too...

bedroge avatar Dec 15 '21 08:12 bedroge

What about -c / --contain? Does that still mount these file systems?

bedroge avatar Dec 15 '21 08:12 bedroge

Looks like --contain may work, I'll try rebuilding OpenMPI using a build container started with --contain and see what that gives...

boegel avatar Dec 15 '21 08:12 boegel