Mark Grondona
Mark Grondona
Some notes from the meeting - Currently node local topology information is lost to Fluxion because we initialize from configured R, which has a flat list of core ids and...
FYI - a user wants to run 2 jobs on a system, each using 4 cores and 4 gpus, but they are hitting a roadblock due to this issue. If...
@garlick tried a simple reproducer (restarting Flux with only rank 0 up, stopping Flux, adding a couple bogus nodes, and starting again) and things worked as expected. So this is...
`resource_reader_rv1exec_t::add_vertex()` initializes the vertex status to `UP`. Is that a problem? https://github.com/flux-framework/flux-sched/blob/fe872c8dc056934e4073b5fb2932335bb69ca73a/resource/readers/resource_reader_rv1exec.cpp#L124 The initial `resource.acquire` response assumes all resources are `DOWN` unless they are in the `up` idset of the...
Nice work @milroy and @jameshcorbett! > This suggests that the default value of status in all readers should be DOWN. I think this will require the current CI tests to...
Correct, the first `resource.acquire` response should only contain an `up` idset. If an id is not in that set, then it should be considered down. Subsequent `resource.acquire` responses can and...
> I think this line is the problem: BTW, that is why I tried to use `mark_now()` instead of `mark_lazy()` (clearly didn't work though)
> the slurm `libpmi2.so` may be installed standalone via the `libpmi2-0-dev` package. Having a libpmi2 package just seems a bit... wrong?
Some notes offline from @garlick: - It might be nice if the DAT local socket ended up in /run/flux like the system instance one but wtih a different name. Then...