flux-core
flux-core copied to clipboard
job-exec module needs a rewrite and new protocol
There are several (more than several) open issues against the job-exec module. This module really needs a complete re-implementation, perhaps first with a thoughtful RFC describing the protocol so we avoid corner cases.
Outstanding issues in the current design include:
- restart of rank 0 broker not supported (#2874)
- job epilog support is not currently enabled and does not support "partial release" (#2317, #2204)
- job shell and early standard output/error is lost (#2206, #3185)
- launching all shells from rank 0 of instance is not scalable
The following use case should also be considered in the design:
- exec system bypass for special workloads (#2924)
I think moving the launch of job shells off of rank 0 to the first rank of the job will help a lot with job throughput, as well as allowing the exec module a way to possible restart on rank 0. We'll have to think about how the job-exec system rediscovers running jobs at restart.
This work still needs to be done, but some of the requirements have changed.
- Now that the strategy to support broker restart includes
libsdprocess
, moving execution of the job shell directly to the involved broker ranks is required because thesdexec
exec implementation can only work locally. (not only to allow restart of rank 0 nor only for scalability). It is an open question whether normal execution should also work this way - Some of the referenced issues above have already been solved, but we should not regress them (e.g. reporting of early launch errors, handling job shell output)
Some other items to think about in the redesign:
- Partial release (#2204)
- Locally initiated prolog/epilog replacing need for ad-hoc solution
I'm sure there are other things I'm missing at the moment.
Wanted to document a side discussion had with @grondo, we also will want a way to configure which "exec" mechanism we want to use. There should be a:
- default mechanism (subprocess, systemd, etc.)
- allow instance owner to use a non-default (e.g. testexec or something else)
- but do not allow guests to chose non-default
the current mechanism in job-exec
for selecting which exec method to use makes the above somewhat tricky to do, and thus has temporarily not been done
(Related to #3970 but not quite the same)