flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

job-shell: add plugin to run user-level prologue and epilogue scripts

Open SteVwonder opened this issue 5 years ago • 7 comments

Per a conversation on Slack with @grondo and @dongahn:

Just to confirm, we don't have epilogue script support currently (just a placeholder), but we do have post-task support in job shell plugins, right?

The epilog script would run as root, so we need IMP support

we could trivially add a plugin to run a script as the user (could even be done directly in the initrc)

SteVwonder avatar Aug 20 '20 21:08 SteVwonder

What is the use case for user prolog/epilog? (i.e. is there a better way to do what the user's need?)

grondo avatar Aug 20 '20 22:08 grondo

That's a great question. TBH, I missed the full use-case. Something related to tools cleanup. Maybe @dongahn can summarize the use case better than me.

SteVwonder avatar Aug 21 '20 00:08 SteVwonder

Olaf Faaland wants to make use of prologue and epilogue scripts on Elmerfudd to:

(1) Run a script to clean up /dev/shm after a job, so that a user who writes data there doesn't reduce the amount available to the next user.

(2) Drop caches after a job, e.g. echo 3 >/proc/sys/vm/drop_caches

My interest in this is primarily to ensure that data and metadata written to a remote file system such as Lustre is flushed to disk before the node is made available to other users. This is partially so that we find out about a problem as early as possible and minimize damage done, and partially so that one user can't hurt the following user's performance.

(3) Run a script to set up and destroy a local ephemeral file systems, for use by the user.

One example is connecting to remote NVME via nvme-over-fabrics, formatting the connected device with a file system such as xfs, and setting permissions so that the user can write to it; and then un-doing that after the job is complete.

(4) Run a script to set up and destroy a shared ephemeral file systems, for use by the user.

Another example is setting up and destroying a shared GFS2 file system. Unlike the local file system setup/destroy case, this would likely need to know the set of nodes participating in the job.

jameshcorbett avatar Oct 13 '21 19:10 jameshcorbett

Olaf Faaland wants to make use of prologue and epilogue scripts on Elmerfudd to:

(1) Run a script to clean up /dev/shm after a job, so that a user who writes data there doesn't reduce the amount available to the next user. (2) Drop caches after a job, e.g. echo 3 >/proc/sys/vm/drop_caches My interest in this is primarily to ensure that data and metadata written to a remote file system such as Lustre is flushed to disk before the node is made available to other users. This is partially so that we find out about a problem as early as possible and minimize damage done, and partially so that one user can't hurt the following user's performance. (3) Run a script to set up and destroy a local ephemeral file systems, for use by the user. One example is connecting to remote NVME via nvme-over-fabrics, formatting the connected device with a file system such as xfs, and setting permissions so that the user can write to it; and then un-doing that after the job is complete. (4) Run a script to set up and destroy a shared ephemeral file systems, for use by the user. Another example is setting up and destroying a shared GFS2 file system. Unlike the local file system setup/destroy case, this would likely need to know the set of nodes participating in the job.

I'm absolutely open to other/better ways to accomplish those tasks.

ofaaland avatar Oct 13 '21 20:10 ofaaland

Unfortunately a job-shell plugin won't work for any of these use cases since it runs as the user of the job, not a privileged process.

We do have support in the IMP (setuid helper) for job prolog and epilog which run as root, but the exec system doesn't have support for invoking the prolog/epilog yet, since that was waiting until the Big Rewrite:tm: #3346.

If this is high priority, it might just be a couple days to a week of work to support prolog/epilog in the current job-exec module.

grondo avatar Oct 13 '21 20:10 grondo

Unfortunately a job-shell plugin won't work for any of these use cases since it runs as the user of the job, not a privileged process.

We do have support in the IMP (setuid helper) for job prolog and epilog which run as root, but the exec system doesn't have support for invoking the prolog/epilog yet, since that was waiting until the Big Rewrite™️ #3346.

If this is high priority, it might just be a couple days to a week of work to support prolog/epilog in the current job-exec module.

We could perhaps work around the "job-shell plugin runs as a user" issue with some creating sudo.d and scripting, but I wonder if prolog/epilog support isn't important for other testing.

Addressing those use cases somehow is definitely required for us to use elmerfudd with flux.

ofaaland avatar Oct 13 '21 20:10 ofaaland

@ofaaland @jameshcorbett - I moved the discussion over to #2205 since these use cases require full prolog/epilog support and this issue is about a "user prolog/epilog"

grondo avatar Oct 13 '21 20:10 grondo