flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

user namespace setup support

Open trws opened this issue 4 years ago • 30 comments

On LLNL TOSS3 systems users can invoke a slurm plugin with --userns that configures the compute nodes they are assigned to allow them to use user/fs/etc namespaces. The main use-case for this is running container software like podman and buildah, or sandboxed apps like flatpacks or snaps or similar. As it stands now, TOSS4 has no similar option.

As I see it, there are two ways this could get resolved. If user namespaces for regular users are enabled by default on TOSS4, then we probably don't need to do anything in flux at all. That is currently not the case, but I'm not sure why it couldn't be. If the plan is to keep it locked down like in TOSS3, then we'll need this for some of the container building workflows we're starting to stand up.

trws avatar Oct 29 '21 22:10 trws

Tagging @tpatki since this can be related to her FY22 ISCP project.

dongahn avatar Oct 29 '21 22:10 dongahn

If this requires a call to unshare() then we may have to implement this an an IMP "plugin" - a facility that unfortunately does not yet exist. Do you know where the source of the userns plugin lives so we can see what other setup might be required? (Edit: nevermind, I found that the plugin is a locally developed spank/lua plugin, so at least for TOSS3 I have a good idea of what is being set up -- it doesn't seem to require unshare() but instead manipulates /etc/subuid and /etc/subgid)

I wonder if there is a strong argument for enabling user namespaces by default on the compute nodes, though, since every job can conceivably use --userns and get that support anyway? I'm not aware enough of the side-effects of using a user namespace to know if users would want to normally run without them enabled if they are not needed, though.

grondo avatar Oct 29 '21 22:10 grondo

As a user, there is no reason at all to ever have the support unavailable. It costs nothing. The only reason to keep it turned off is if someone thinks there's a potential security or other issue, but if we can just turn it on with slurm I'm not sure why that would be good enough.

trws avatar Oct 29 '21 22:10 trws

Found this: https://slurm.schedmd.com/SLUG19/NERSC_job_container.pdf

garlick avatar Oct 29 '21 22:10 garlick

As noted in #4758, it seems that user namespaces are enabled on TOSS 4. The question remains if that is sufficient to allow flux to launch full OS containers (like singularity) without adding any special support to the IMP or elsewhere. Can we do that experiment and report what, if anything, is needed from Flux to support launching jobs in containers? I think that is the question that motivated this issue. The other cool namespacey things we can do if allowed can have their own issues as we dream them up.

garlick avatar Nov 09 '22 19:11 garlick

@trws brought up idea of adding support to flux for mounting user-created file system images in private namespaces for jobs on demand. He pointed out that fuse2fs(1) (part of e2fsprogs) allows one to mount an ext4 block device image that was created by a user without the usual kernel oops/security concerns of doing that with a direct kernel ext4 mount.

He says the following runes were sufficient to make a fuse2fs mount possible.

unshare --user --mount --map-root-user <shell>

So maybe a shell plugin that just takes a path to an image file in a shared file system and mounts it in FLUX_JOB_TMPDIR?

garlick avatar Dec 20 '23 20:12 garlick

Can we do that experiment and report what, if anything, is needed from Flux to support launching jobs in containers? I think that is the question that motivated this issue. The other cool namespacey things we can do if allowed can have their own issues as we dream them up.

So maybe a shell plugin that just takes a path to an image file in a shared file system and mounts it in FLUX_JOB_TMPDIR?

Is there still a need to do this experiment and determine next steps for supporting user namespaces? Is that something we're willing or even want to support? Is anybody asking for it? I'm doing a pass of some of these issues to get an idea of where we're at.

Circling back after the fluxion meeting this week (when we discussed the work I'm doing with ephemeral ext4 files in lustre), It would be nice to have a root-less alternative, particularly if it also helps with container support or other areas we're exploring. But it seems like maybe we went down this road already, and decided against it?

wihobbs avatar Aug 23 '24 17:08 wihobbs

We still need a solution for ensuring users have additional UID mappings in the LC environment, ideally matching the slurm --userns option we have as a plugin on toss4 slurm systems.

trws avatar Aug 23 '24 18:08 trws

@trws and I talked over email to Elena Green this morning, who wrote the --userns SPANK plugin for LC. Since LC has amended our subuid and subgid management strategy, we no longer have to support this to support containers in LC.

Ok to close this issue? We could also leave it open for some of the "nice to haves" that have been discussed here:

  • container image pre-propagation to /var/tmp so that users aren't hammering the network filesystem fetching containers in parallel (a prolog might be able to do this quite trivially)
  • a plugin that takes an ext4 device and uses fuse2fs to mount in a mount namespace

wihobbs avatar Sep 12 '24 19:09 wihobbs

Nice!

For my education, how does one launch, say, an MPI hello world program in a container under Flux on an LC system? Have we tried it?

garlick avatar Sep 12 '24 19:09 garlick

I have not yet, it sounds like they basically did what I was hoping for though which was to set up the sub*id mappings for everyone just by default on every node with a smaller range. If that’s right, then flux run -N 5 -n 40 podman run some_registry:mpich_hello_world should just go, or maybe after running the setup script. It would also mean we could run real containers on the login nodes for the first time, which would be fantastic.

On 12 Sep 2024, at 12:30, Jim Garlick wrote:

Nice!

For my education, how does one launch, say, an MPI hello world program in a container under Flux on an LC system? Have we tried it?

-- Reply to this email directly or view it on GitHub: https://urldefense.us/v3/https://github.com/flux-framework/flux-core/issues/3927*issuecomment-2347083011;Iw!!G2kpM7uM-TzIFchu!x3vb_rSOQkarWv1UOCiq2D3J2qIhDR1mWARw0ut1-zu8Mdz-63_9LnRv_Q5LCqNjkyqqnbaph_UqBmZN2N90JoaPh8s$ You are receiving this because you were mentioned.

Message ID: @.***>

trws avatar Sep 12 '24 22:09 trws

Yes, that's correct. However, there are a few issues with podman itself preventing MPI from working out-of-the-box like you describe. I've communicated those issues to the podman development team and hope to see progress soon.

egreen77 avatar Sep 12 '24 23:09 egreen77

That’s good to know. If it isn’t sensitive, is it bootstrapping issues or NIC access or what kind of thing are we hitting?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Elena Green @.> Sent: Thursday, September 12, 2024 4:29:21 PM To: flux-framework/flux-core @.> Cc: Scogland, Tom @.>; Mention @.> Subject: Re: [flux-framework/flux-core] user namespace setup support (Issue #3927)

Yes, that's correct. However, there are a few issues with podman itself preventing MPI from working out-of-the-box like you describe. I've communicated those issues to the podman development team and hope to see progress soon.

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3927*issuecomment-2347406357__;Iw!!G2kpM7uM-TzIFchu!2rra57owLLcHgt-uiIJa7ZL1o8FtYN-X5Yi3MmFVX9DQdmg415_hDzbK69NqqNiGyECamRtFlAUWUzMFDkc0Y1Fp4ac$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNNTLIOJEZ6VXEJP3PLZWIPVDAVCNFSM6AAAAABNASUPFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXGQYDMMZVG4__;!!G2kpM7uM-TzIFchu!2rra57owLLcHgt-uiIJa7ZL1o8FtYN-X5Yi3MmFVX9DQdmg415_hDzbK69NqqNiGyECamRtFlAUWUzMFDkc0DQOKX5k$. You are receiving this because you were mentioned.Message ID: @.***>

trws avatar Sep 12 '24 23:09 trws

The main issue is that podman doesn't pass fds from the job step launcher to the process inside the container, preventing the process from communicating with the PMI.

Podman does have an option --preserve-fds=N to pass in a block of fds, but it expects those fds to be in a contiguous range between 3 and 3+N and it'll abort if any of those fds aren't present. The PMI communicator fds that the runtime expects are usually in a disconnected range and don't play well with that.

Newer versions of podman provide a --preserve-fd argument to preserve specific fds, but that still requires some introspection at launch time by a wrapper script to determine the correct file descriptor values to provide to podman.

egreen77 avatar Sep 12 '24 23:09 egreen77

Ah, OK. Flux uses "simple PMI" by default where the PMI_FD environment contains the single file descriptor number that's expected to be used by the MPI proc.

garlick avatar Sep 13 '24 01:09 garlick

That makes sense, we could probably help someone in writing a little OCI hook to handle that. May take a glance at it, I think it would be pretty trivial since the FD number is accessible in the environment, but would have to check that the hook can access the option.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Jim Garlick @.> Sent: Thursday, September 12, 2024 6:11:56 PM To: flux-framework/flux-core @.> Cc: Scogland, Tom @.>; Mention @.> Subject: Re: [flux-framework/flux-core] user namespace setup support (Issue #3927)

Ah, OK. Flux uses "simple PMI" by default where the PMI_FD environment contains the single file descriptor number that's expected to be used by the MPI proc.

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3927*issuecomment-2347819184__;Iw!!G2kpM7uM-TzIFchu!3s2iaxoMgKeYgN1hZ5cUKIt3PNLT3VmPQ5oO9PQwGsZxnsiMjo5wC12suKpZ8tZlyTSmP_fwukVlRMtcucXeegTzWnI$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNMEJKIOLVOB6KE55CDZWI3VZAVCNFSM6AAAAABNASUPFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXHAYTSMJYGQ__;!!G2kpM7uM-TzIFchu!3s2iaxoMgKeYgN1hZ5cUKIt3PNLT3VmPQ5oO9PQwGsZxnsiMjo5wC12suKpZ8tZlyTSmP_fwukVlRMtcucXeo2wmXKo$. You are receiving this because you were mentioned.Message ID: @.***>

trws avatar Sep 13 '24 16:09 trws

Actually, digging around a bit this might be relevant: https://github.com/containers/podman/issues/10410#issuecomment-845125178 Mpich added a “—pmi-port” option to hydra to have it expose the PMI interface on a socket, if that uses the same simple protocol that might be really easy to add support for, maybe even expose both? Have to see how the env var values change. Apparently there since MPICH 3.4.x at least.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Scogland, Tom @.> Sent: Friday, September 13, 2024 9:28:03 AM To: flux-framework/flux-core @.>; flux-framework/flux-core @.> Cc: Mention @.> Subject: Re: [flux-framework/flux-core] user namespace setup support (Issue #3927)

That makes sense, we could probably help someone in writing a little OCI hook to handle that. May take a glance at it, I think it would be pretty trivial since the FD number is accessible in the environment, but would have to check that the hook can access the option.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Jim Garlick @.> Sent: Thursday, September 12, 2024 6:11:56 PM To: flux-framework/flux-core @.> Cc: Scogland, Tom @.>; Mention @.> Subject: Re: [flux-framework/flux-core] user namespace setup support (Issue #3927)

Ah, OK. Flux uses "simple PMI" by default where the PMI_FD environment contains the single file descriptor number that's expected to be used by the MPI proc.

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3927*issuecomment-2347819184__;Iw!!G2kpM7uM-TzIFchu!3s2iaxoMgKeYgN1hZ5cUKIt3PNLT3VmPQ5oO9PQwGsZxnsiMjo5wC12suKpZ8tZlyTSmP_fwukVlRMtcucXeegTzWnI$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNMEJKIOLVOB6KE55CDZWI3VZAVCNFSM6AAAAABNASUPFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXHAYTSMJYGQ__;!!G2kpM7uM-TzIFchu!3s2iaxoMgKeYgN1hZ5cUKIt3PNLT3VmPQ5oO9PQwGsZxnsiMjo5wC12suKpZ8tZlyTSmP_fwukVlRMtcucXeo2wmXKo$. You are receiving this because you were mentioned.Message ID: @.***>

trws avatar Sep 13 '24 17:09 trws

Sorry for the flood here, but it looks like it’s literally just set up a socket to accept connections, put the port number in PMI_PORT, and pass the resulting connected socket to the regular pmi-simple logic if someone connects. I kinda love PMI simple.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Scogland, Tom @.> Sent: Friday, September 13, 2024 10:27:58 AM To: flux-framework/flux-core @.>; flux-framework/flux-core @.> Cc: Mention @.> Subject: Re: [flux-framework/flux-core] user namespace setup support (Issue #3927)

Actually, digging around a bit this might be relevant: https://github.com/containers/podman/issues/10410#issuecomment-845125178 Mpich added a “—pmi-port” option to hydra to have it expose the PMI interface on a socket, if that uses the same simple protocol that might be really easy to add support for, maybe even expose both? Have to see how the env var values change. Apparently there since MPICH 3.4.x at least.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Scogland, Tom @.> Sent: Friday, September 13, 2024 9:28:03 AM To: flux-framework/flux-core @.>; flux-framework/flux-core @.> Cc: Mention @.> Subject: Re: [flux-framework/flux-core] user namespace setup support (Issue #3927)

That makes sense, we could probably help someone in writing a little OCI hook to handle that. May take a glance at it, I think it would be pretty trivial since the FD number is accessible in the environment, but would have to check that the hook can access the option.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Jim Garlick @.> Sent: Thursday, September 12, 2024 6:11:56 PM To: flux-framework/flux-core @.> Cc: Scogland, Tom @.>; Mention @.> Subject: Re: [flux-framework/flux-core] user namespace setup support (Issue #3927)

Ah, OK. Flux uses "simple PMI" by default where the PMI_FD environment contains the single file descriptor number that's expected to be used by the MPI proc.

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3927*issuecomment-2347819184__;Iw!!G2kpM7uM-TzIFchu!3s2iaxoMgKeYgN1hZ5cUKIt3PNLT3VmPQ5oO9PQwGsZxnsiMjo5wC12suKpZ8tZlyTSmP_fwukVlRMtcucXeegTzWnI$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNMEJKIOLVOB6KE55CDZWI3VZAVCNFSM6AAAAABNASUPFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXHAYTSMJYGQ__;!!G2kpM7uM-TzIFchu!3s2iaxoMgKeYgN1hZ5cUKIt3PNLT3VmPQ5oO9PQwGsZxnsiMjo5wC12suKpZ8tZlyTSmP_fwukVlRMtcucXeo2wmXKo$. You are receiving this because you were mentioned.Message ID: @.***>

trws avatar Sep 13 '24 17:09 trws

If that's the PMI_PORT stuff, I'm not sure it's going to be compiled-in on any of our MPI implementations. It is not enabled by default, one has to configure mpich with --enable-pmiport. It's also insecure (although we could bind to localhost and have it not be too bad)

Hmm, the MPIs will dlopen libpmi.so if PMI_FD is not set (might have to verify that). That could internally use PMI_PORT if set. An improvement over that might be to add a Flux RPC transport for the PMI server so that libpmi.so could securely connect to the PMI server via flux's local:// connector.

PMIx already uses a unix domain socket so I would imagine openmpi with pmix would just work here. We're really only talking about the MPICH derivatives.

garlick avatar Sep 13 '24 18:09 garlick

Yeah, OpenMPI should work, but if we want the library to be loaded we would have to bind mount it into the container and do a great deal of extra intrusive work to do that. Even domain sockets would need to be bind mounted in, but at least wouldn’t require injecting a library that may or may not be compiled with a compatible libc.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Jim Garlick @.> Sent: Friday, September 13, 2024 11:42:43 AM To: flux-framework/flux-core @.> Cc: Scogland, Tom @.>; Mention @.> Subject: Re: [flux-framework/flux-core] user namespace setup support (Issue #3927)

If that's the PMI_PORT stuff, I'm not sure it's going to be compiled-in on any of our MPI implementations. It is not enabled by default, one has to configure mpich with --enable-pmiport. It's also insecure (although we could bind to localhost and have it not be too bad)

Hmm, the MPIs will dlopen libpmi.so if PMI_FD is not set (might have to verify that). That could internally use PMI_PORT if set. An improvement over that might be to add a Flux RPC transport for the PMI server so that libpmi.so could securely connect to the PMI server via flux's local:// connector.

PMIx already uses a unix domain socket so I would imagine openmpi with pmix would just work here. We're really only talking about the MPICH derivatives.

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3927*issuecomment-2349858424__;Iw!!G2kpM7uM-TzIFchu!010kOXL8EjZNdVEBP4od-tUsUn0-vskQcfClmFjy2nEFXJbMBUpLiY0H0YyZlkpEsulEZm7QP-qxzG8T7alRCoX-3wU$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNI234IS45XX7ENVGM3ZWMW2HAVCNFSM6AAAAABNASUPFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBZHA2TQNBSGQ__;!!G2kpM7uM-TzIFchu!010kOXL8EjZNdVEBP4od-tUsUn0-vskQcfClmFjy2nEFXJbMBUpLiY0H0YyZlkpEsulEZm7QP-qxzG8T7alRIeyRx9Y$. You are receiving this because you were mentioned.Message ID: @.***>

trws avatar Sep 13 '24 21:09 trws

Oh ick, good point.

garlick avatar Sep 13 '24 21:09 garlick

@garlick to your earlier question, here is an example of how to run a super simple single-user container on a Flux-scheduled cluster in LC. Note that I did not try running an MPI hello world due to the limitations Tom brought up. But you can run python, and probably jupyterhub too (I didn't bother with VNC to actually see, although it said it launched successfully), which is something I've seen users clamoring for in mattermost.

flux alloc -N1
(s=1,d=1)  corona197 ~ $ /admin/scripts/weg/enable-podman.sh # do each time, adds things in /tmp
(s=1,d=1)  corona197 ~ $ podman pull jupyterhub/singleuser
Resolved "jupyterhub/singleuser" as an alias (/g/g0/hobbs17/.cache/containers/short-name-aliases.conf)
Trying to pull docker.io/jupyterhub/singleuser:latest...
Getting image source signatures
Copying blob 9bf717fe8843 done   |
...
Writing manifest to image destination
f00aa96928ba3b1a852cca75dc05b759e3893be67e225f29f231b6a730dc474b
WARN[0045] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address 
   of session bus
(s=1,d=1)  corona197 ~ $ podman run --rm -it jupyterhub/singleuser python3
Entered start.sh with args: python3
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: python3
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

I was unable to run OS containers like ubuntu due to an insufficient subgid space, which I guess is to be expected since every user only gets 2048:

root@7168706153ff:/bin# apt update
E: setgroups 65534 failed - setgroups (22: Invalid argument)
E: setegid 65534 failed - setegid (22: Invalid argument)
Reading package lists... Done
E: setgroups 65534 failed - setgroups (22: Invalid argument)
E: setegid 65534 failed - setegid (22: Invalid argument)
E: Method gave invalid 400 URI Failure message: Failed to setgroups - setgroups (22: Invalid argument)
E: Method gave invalid 400 URI Failure message: Failed to setgroups - setgroups (22: Invalid argument)
E: Method http has died unexpectedly!
E: Sub-process http returned an error code (112)

And I didn't try running MPI due to the limitations described above. But I suppose maybe something to try at a later date.

wihobbs avatar Sep 16 '24 13:09 wihobbs

Sorry, the above note about ubuntu containers is...confusing (apologies, only 1 cup of coffee so far!) I could run them, but I'd say they're not super useful, as they throw the error above whenever you try to install anything (like compilers for trying to build a hello-world).

wihobbs avatar Sep 16 '24 13:09 wihobbs

Here's a little primer on running containers in LC: https://hpc.llnl.gov/services/cloud/containers. Some of the "gotchas" are probably different with the new way of managing sub[u/g]ids.

wihobbs avatar Sep 16 '24 13:09 wihobbs

We haven't updated the documentation to match the new subuid scheme yet. apt failed because Debian distros need either an apt config edit or a custom mapping because they use UID 65534 as a sandbox user.

Try using:

$ podman run --uidmap 0:0:2000 --uidmap 65534:2047:1 ...

egreen77 avatar Sep 16 '24 15:09 egreen77

I wonder if we could mitigate some of these with a couple of hook scripts. The MPI thing I'm going to try and play with today and see if I can find a basic workaround.

trws avatar Sep 17 '24 00:09 trws

Ok, this is vaguely awful, but it does actually work on LC with mpich in podman today:

command: flux run -N 2 -n 4 ./podman-test.sh

script podman-test.sh:

 #!/bin/sh
 echo got $PMI_FD

 exec podman run -v "$(pwd):$(pwd)" \
   --net=host --pid=host --ipc=host \
   --preserve-fds="1" \
   -w "$(pwd)" --rm -it -u "0:0" \
   --env-host docker.io/mfisherman/mpich:latest \
   /bin/sh -c "PMI_FD=3 ./mpi-test" 3>&$PMI_FD

With a slightly newer podman the FD manipulation isn't necessary, I tested this on my workstation and it's somewhat less painful:

 #!/bin/sh

 exec podman run -v "$(pwd):$(pwd)" \
   --net=host --pid=host --ipc=host \
   --preserve-fd="$PMI_FD" \
   -w "$(pwd)" --rm -it -u "0:0" \
   --env-host docker.io/mfisherman/mpich:latest \
   ./mpi-test

trws avatar Sep 17 '24 03:09 trws

FWIW I also hacked together a version of the script that uses socat to connect the FD to a port, and supplies the port along with the hostname to the executable. Our mpich does in fact have PMI_PORT support, or at least it connects, but we get a pmi-simple wire protocol error so there must be something it asks for that we don't support because FD doesn't use it.

trws avatar Sep 17 '24 04:09 trws

Oh that is good to know!

There is an additional 'fullinit' protocol element that PMI_PORT requires since the peer rank is not known in advance like with PMI_FD. I should just go ahead and add that and document it.

On Mon, Sep 16, 2024, 9:55 PM Tom Scogland @.***> wrote:

FWIW I also hacked together a version of the script that uses socat to connect the FD to a port, and supplies the port along with the hostname to the executable. Our mpich does in fact have PMI_PORT support, or at least it connects, but we get a pmi-simple wire protocol error so there must be something it asks for that we don't support because FD doesn't use it.

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-core/issues/3927#issuecomment-2354509700, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJPWYKGM4QJAIPP7SVMFDZW6YZ7AVCNFSM6AAAAABNASUPFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJUGUYDSNZQGA . You are receiving this because you were mentioned.Message ID: @.***>

garlick avatar Sep 17 '24 05:09 garlick

In case you want to play with it, here's the script:

#!/bin/bash
echo got $PMI_FD
env | grep PMI

export PORT=$((12345 + $PMI_RANK))
export PMI_PORT=localhost:$PORT
export PMI_ID=$PMI_RANK
socat -s -4 fd:$PMI_FD tcp-listen:$PORT &

PID=$!
unset PMI_FD
podman run -v "$(pwd):$(pwd)" \
  --net=host --pid=host --ipc=host \
  -w "$(pwd)" --rm -it -u "0:0" \
  --env-host docker.io/mfisherman/mpich:latest \
  /bin/sh -c "./mpi-test"
kill $PID

And this is the error with -overbose=2:

26.598s: flux-shell[0]: TRACE: pmi-simple: 2: C: cmd=initack pmiid=2
26.598s: flux-shell[0]: TRACE: pmi-simple: 2: S: pmi request error
26.598s: flux-shell[0]: FATAL: pmi-simple: PMI-1 wire protocol error

trws avatar Sep 17 '24 05:09 trws