DrDaveD

Results 262 comments of DrDaveD

Unfortunately I have no experience with MPI and don't have access to a machine that has it. So I'm going to have to rely on others to debug further. It...

`unshare -r` doesn't actually run as root, it just fools processes into thinking they are. It's a root-mapped unprivileged user namespace. So it would be helpful to see if it...

Nothing changed in PID namespaces that I can think of, but Apptainer 1.1 does now use unprivileged user namespaces by default and Singularity 3.8 didn't. The issue doesn't occur in...

@mcuma I tried building a sif from `docker://glotzerlab/software:2019.09-cuda9-mlx-openmpi3.0.0` and then I got the following error: ``` $ mpirun -np 2 apptainer exec issue769.sif true [wc.fnal.gov:12676] PMIX ERROR: BAD-PARAM in file...

Well you always have the option of installing `apptainer-suid`.

That works for me if I do the right usage and execute a test mpi program instead of "true". I mean it actually works, it does not reproduce the problem:...

@kcgthb So it looks like the use of ucx is what causes the breakage. I tried making my own container using the configure `--with-ucx` flag but I got ``` configure:...

Good, now I can finally reproduce the problem, using that change in container build recipe. I confirm that setting `UCX_POSIX_USE_PROC_LINK=n` works around the problem. I also found that running the...

However using a sandbox instead of a sif file does not workaround the problem, so that makes me less hopeful that #759 will help because that also makes all the...

I have put in some more debugging messages looking at the `/proc` entries that UCX is complaining about, and discussed with @cclerget, but we can't think of anything that UCX...