flux-sched Fluxion go bindings: job info and resource graph summary

We need to expose additional functionality with our Go bindings to make it easier to work on fluence and debug. Specifically I'd like to be able to:

Ask for full info on a job from the id. Right now the info function returns a few booleans, time since allocated, and metrics, but really we'd like to see the details of the allocation
Tag a match allocate request with a string of interest, in our case a group id, so it could be queried later to look up a jobid. The use case is that we should be able to get a jobid from fluence via the group name to then see the state.
Get a full listing of current jobs and allocation summary (e.g., akin to flux jobs)

We need to have https://github.com/flux-framework/flux-sched/pull/1120 merged first, and then better error messages added here via formal PR (https://github.com/flux-framework/flux-sched/issues/1128) and then we can work on these endpoints. Having them will make it much easier to debug fluence, primarily being able to generate views with our kubectl plugin that show the total resources that fluence has against what is allocated (and then calculate available). Right now we are grepping logs and it's very arduous, and can only get the original resource graph, and we would have to save the result of match allocate + pair with a cancel request to (at best estimate) the current state. I'd rather not do that because it's error prone.

Jan 22 '24 00:01 vsoch

Also todo: we should add description about the difference between the go client and module, details in this comment: https://github.com/flux-framework/flux-sched/pull/1133#discussion_r1468797244

Jan 28 '24 09:01 vsoch

It would be cool to have go bindings for flux-core, especially if it worked "natively" with goroutines and so on.

I'd like to propose that, while flux-core is still undergoing pre-1.0 API churn, this should be developed (prototyped?) in an external repo, and that changes in flux-core would not be held back by flux-golang breakage. IOW it would incumbent on the flux-golang developers to track flux-core not on the flux-core developers to ensure that any change doesn't break flux-golang.

Jan 28 '24 17:01 garlick

Yep I'm happy to do that! I would just have flux as a submodule or similar and then fwoop it in. Does it matter where I put it (I can't make repos under flux-framework so likely I'll work on it over in https://github.com/converged-computing.

And side note - I really like the separate design, regardless of the reason for it. I understand it's easier to move things in sync, but for maintaining, and especially with a ton of bindings over time, I think modularity is king. Of course understanding the need to keep things in sync. But usually there are ways to do that with automation.

Jan 28 '24 18:01 vsoch

That said, the python bindings for flux core are pretty essential, so I think they belong in tree. Also, I realized just now this discussion is in the wrong place - I'm going to link it to the issue I opened https://github.com/flux-framework/flux-core/issues/5709. If we have more discussion specific to flux-core let's pick up there.

Jan 28 '24 19:01 vsoch

flux-sched flux-sched copied to clipboard

Fluxion go bindings: job info and resource graph summary

flux-sched
flux-sched copied to clipboard