Dong H. Ahn

Results 31 issues of Dong H. Ahn

The Aha Moles team is getting the following `MPI_Abort` issue. This could be an application issue, but I am still creating this issue ticket as a placeholder to get more...

As documented in https://github.com/flux-framework/flux-sched/blob/master/resource/utilities/README.md and https://github.com/flux-framework/flux-sched/blob/master/resource/traversers/dfu_impl.cpp#L232, our resource graph infrastructure cannot handle a jobspec with same resource type siblings: e.g., ``` cluster[1]->node[1]->core[1] ->node[2]->core[40] ``` But the class of jobspec has...

From https://github.com/flux-framework/flux-sched/pull/922#issuecomment-1131135688 > It might be useful to leave some notes or instructions on how one might go about supporting some of the optional RFC 31 constraint operators in the...

The traverser currently don't use the vertex properties as a match criteria and this hampers our ability to do more precise selections of resources.

enhancement

Per https://github.com/flux-framework/flux-sched/pull/922#issuecomment-1131135688

From https://github.com/flux-framework/flux-sched/pull/937#issuecomment-1121351522: There are a few places in the fluxion code that return -1 or jobid to check error conditions and we have assumed jobid won't exceed the max of...

As noted at https://github.com/flux-framework/flux-sched/pull/921#discussion_r839756041, this pattern is pervasive within flux-sched and we want to fix them in a single PR.

Modeling after PR #826, we should transition all of the backfill policies and other qmanager-resource RPC patterns to async.

``` FAIL: t7000-shell-datastaging.t 1 - node-local storage allocation staging rank 0 FAIL: t7000-shell-datastaging.t 2 - node-local storage allocation staging rank 1 FAIL: t7000-shell-datastaging.t 3 - cluster-local storage allocation staging rank...