Mark Grondona
Mark Grondona
Didn't forget about this one @chu11 - on my list to look at this week.
> So I'd probably start breaking down (a) here. Currently, the `job-exec` module in the parent monitors for lost connections to job shells (i.e. when the exec protocol gets a...
@ofaaland, as an experiment I added support for a new jobspec attribute `exec.ignore-lost-ranks` which, when set, causes the `job-exec` module to raise a non-fatal exception when it loses contact with...
@ofaaland: I've gotten a little further in #4615, though it is pretty experimental at this point.
FYI - #4615 is now proposed for merging. When the PR is merged, I think we could close this issue. The most recent version of the PR adopts @garlick's idea...
Duplicate of #2801. Note: forwarding works currently in TOSS 3 for the reasons described in #2801, ssh forwarding on our clusters allows connections over the cluster-local network, and since `DISPLAY`...
I'm actually unsure this is something that needs support directly in Flux, at least at this early stage, but is more of a general site configuration issue. There are two...
Only other thing I can think of is that the background `ssh` process is holding the stdout/err file descriptors open. Does running ssh with `>/dev/null 2>&1` help? I had mistakenly...
> Edit: yeah, I guess it would if you did some dumb debugging like this I ran all the current tests that contain `flux job list` with `-d -v` and...
Oh, good guess. I'm wondering if you could test this theory by adding ``` LimitMEMLOCK=unlimited ``` to `/etc/systemd/system/flux.service.d/override.conf` and restarting the brokers? Note: untested, the `unlimited` syntax may not be...