flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

harden flux-core processing of jobspec for rabbit storage use case

Open grondo opened this issue 11 months ago • 0 comments

Problem: When jobspec is submitted with DWS directives in the CORAL2 environment, the jobspec will be modified after submission by a flux-coral2 jobtap plugin to add necessary resource information for Fluxion to schedule rabbits. This solution may end up adding vertices to the resources section of the jobspec which are not currently supported by flux-core internals. Specifically, libjj, a very simple flux-core internal convenience library, may throw an error when trying to get its simplified resource counts from such a jobspec.

As a motivating example, for testing purposes @jameshcorbett was submitting a pre-modified jobspec to a Flux system instance with the novalidate flag, and the job was still rejected with the error:

flux-job: Unsupported resource type 'rack'

It turns out that the limit-job-size plugin uses libjj which was the source of this error.

Since the jobspec will be modified after limits are checked in the real use case, this particular failure is not critical. However, there may be other parts of flux-core that use libjj, so that library should perhaps be made more forgiving when parsing jobspec.

grondo avatar Mar 04 '24 22:03 grondo