Mark Grondona comments

Results 668 comments of


                                            Mark Grondona

job-exec/sdexec: enable resource control properties MemoryMax, MemoryHigh, AllowedCPUs, etc.

Didn't forget about this one @chu11 - on my list to look at this week.

use case: SCR needs allocations to tolerate single node failure

> So I'd probably start breaking down (a) here. Currently, the `job-exec` module in the parent monitors for lost connections to job shells (i.e. when the exec protocol gets a...

use case: SCR needs allocations to tolerate single node failure

@ofaaland, as an experiment I added support for a new jobspec attribute `exec.ignore-lost-ranks` which, when set, causes the `job-exec` module to raise a non-fatal exception when it loses contact with...

use case: SCR needs allocations to tolerate single node failure

@ofaaland: I've gotten a little further in #4615, though it is pretty experimental at this point.

use case: SCR needs allocations to tolerate single node failure

FYI - #4615 is now proposed for merging. When the PR is merged, I think we could close this issue. The most recent version of the PR adopts @garlick's idea...

support forwarding of X11 display from allocation

Duplicate of #2801. Note: forwarding works currently in TOSS 3 for the reasons described in #2801, ssh forwarding on our clusters allows connections over the cluster-local network, and since `DISPLAY`...

support forwarding of X11 display from allocation

I'm actually unsure this is something that needs support directly in Flux, at least at this early stage, but is more of a general site configuration issue. There are two...

support forwarding of X11 display from allocation

Only other thing I can think of is that the background `ssh` process is holding the stdout/err file descriptors open. Does running ssh with `>/dev/null 2>&1` help? I had mistakenly...

flux-job: point users to flux-jobs(1)

> Edit: yeah, I guess it would if you did some dumb debugging like this I ran all the current tests that contain `flux job list` with `-d -v` and...

mvapich2-tce: cannot create cq

Oh, good guess. I'm wondering if you could test this theory by adding ``` LimitMEMLOCK=unlimited ``` to `/etc/systemd/system/flux.service.d/override.conf` and restarting the brokers? Note: untested, the `unlimited` syntax may not be...