flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

ingest feasibility check happens before jobspec-defaults are applied

Open garlick opened this issue 3 years ago • 5 comments

Problem: the job-ingest feasibility plugin sends jobspec to the scheduler for an early determination of request feasibility, but the jobspec-defaults jobtap plugin has not yet assigned default values for jobspec system attributes such as the queue.

Ideally we would move the feasibility check to a jobspec validate plugin. However, that may be tricky given that it requires an RPC whose response should be handled asynchronously. Maybe there is a clever way we could have the validate callbacks fulfill a composite future.

garlick avatar Aug 15 '22 21:08 garlick

As a stopgap we can have the feasibility plugin read the defaults from configuration and send the amended jobspec to the scheduler feasibility service, with a note that the feasibility plugin can be removed once this issue is resolved.

grondo avatar Aug 15 '22 21:08 grondo

Most likely obvious, but since I didn't immediately think of it: the mechanism to add a constraint based on the selected queue would also have to be implemented in such a stopgap. The algorithm used in the jobspec-default jobtap plugin doesn't quite work for this off the shelf as noted in #4438. I was thinking of possibly adding another plugin to deal with queue constraints as a special case.

garlick avatar Aug 15 '22 23:08 garlick

@grondo had the idea of moving the jobspec updates to the ingest module's validator plugins. That makes a lot of sense: it should scale better since ingest is loaded on all ranks, and sites can provide their own plugins in python and not risk segfaulting the rank 0 broker if they get something wrong.

Since ingest does not write to the eventlog, but is responsible for writing the jobspec to the KVS (an also J, the signed version provided by the user), presumably we would write the jobspec with the changes applied and forgo posting the jobspec-update events. Obviously J cannot be modified so the two could be compared to see what was modified for debugging/provenance.

garlick avatar Aug 16 '22 14:08 garlick

Another idea might be to amend the job-manager.submit request to take an optional array of jobspec updates, which would then be emitted to the eventlog by the job-manager upon receipt. I had originally assumed that would be necessary for restart, but now that you mention it above I realize I was not correct.

I'm not sure if it would be a good or bad thing that flux job info JOBID jobspec would then return the modified jobspec to the user. (Also, job-list would just automatically be reading the amended jobspec)

grondo avatar Aug 16 '22 14:08 grondo

Seems like it might be a good thing? it does seem more convenient to have the modified one most of the time. And we could add a way to access the original through J if needed.

garlick avatar Aug 16 '22 16:08 garlick

Closed by #4529

garlick avatar Sep 15 '22 20:09 garlick