software-layer
software-layer copied to clipboard
failing OpenMPI test when NFS mounts leak into build container
FAIL: opal_path_nfs
===================
Failure : Mismatch: input "/vscmnt/leuven_icts", expected:0 got:1
Failure : Mismatch: input "/vscmnt/brussel_hydra_home", expected:0 got:1
This looks very much like https://github.com/open-mpi/ompi/issues/318
I was able to work around this by binding /vscmnt to an empty directory when starting the build container, by adding /tmp/vscmnt:/vscmnt to SINGULARITY_BIND in start_build_node_env.sh, but we should find a better/generic solution for this.
Should we somehow try to ignore the system-wide Singularity configuration file (/etc/singularity/singularity.conf), since that's where the bind of /vscmnt (where these NFS mounts are located) is specified?
The singularity command has a --config option to specify an alternate configuration file, but using that requires being root...
Also worth mentioning: this problem this not occur with the 2021.06 pilot version, even though the same HPC-UGent system (our Skylake cluster) was used. The Singularity configuration hasn't changed since then (Aug'21), as far as I can tell...
Singularity also has a --no-mount flag, but then you still need to specify which mounts have to be disabled, so that's a bit cumbersome too...
What about -c / --contain? Does that still mount these file systems?
Looks like --contain may work, I'll try rebuilding OpenMPI using a build container started with --contain and see what that gives...