flux-sched Use vertex properties for matching

The traverser currently don't use the vertex properties as a match criteria and this hampers our ability to do more precise selections of resources.

May 05 '20 17:05 dongahn

For example, LLNL has a use case where a cluster can consist of two types of compute nodes: one with 4 GPUs and other with 8 GPUs. For some job specification, it would be good to be able to select one over the other.

May 05 '20 17:05 dongahn

Please look at the behavior of resource-query on the jobspec generated from flux mini run -N 2 -n 2 -c 18 -g 4 --dry-run xhlp > linpack/spec.N2.n2.c18.g4.yaml

I did some validation for our graph scheduler using the resource-query tool and found the base case should work more or less out of box on heterogenous system configurations like future Corona. I used a coarse-grained GRUG (attached), the graph representation matched with that produced from hwloc reader. Each gpu and core is modeled as a direct child resource vertex of a compute node vertex. A half of the nodes (150) has 4 gpus and 36 cores; the other half has 8 gpus and 36 cores. The graph representation of GRUG (resource graph generation recipe) is also attached. The jobspec used were all generated from flux mini interface:

# node[1]->slot[2]->gpu[4]
#                 ->core[18]
flux mini run -N 1 -n 2 -c 18 -g 4 --dry-run xhlp > linpack/spec.N1.n2.c18.g4.yaml
 
 
# node[1]->slot[2]->gpu[2]
#                 ->core[18]
flux mini run -N 1 -n 2 -c18 -g2 --dry-run xhlp > linpack/spec.N1.n2.c18.g2.yaml
 
 
# node[2]->slot[1]->gpu[4]
#                 ->core[18]
flux mini run -N 2 -n 2 -c 18 -g 4 --dry-run xhlp > linpack/spec.N2.n2.c18.g4.yaml
 
 
# slot[450]->gpu[4]
#          ->core[18]
flux mini run -n 450 -c 18 -g 4 --dry-run xhlp > linpack/spec.n450.c18.g4.yaml
 
 
#
# Using low id first match policy and rlite match emit format
ahn1@6e8124a3b9bc:/usr/src/t/data/resource/grugs$ ../../../../resource/utilities/resource-query -L hetero_coarse.graphml -P low -f grug -F rlite
INFO: Loading a matcher: CA
 
 
# node[1]->slot[2]->gpu[4] matches with a fatter node (nodes are named dnode*)
#
resource-query> match allocate linpack/spec.N1.n2.c18.g4.yaml
[{"rank": "-1", "node": "dnode0", "children": {"core": "0-35", "gpu": "0-7"}}]
INFO: =============================
INFO: JOBID=1
INFO: RESOURCES=ALLOCATED
INFO: SCHEDULED AT=Now
INFO: =============================
 
 
# node[1]->slot[2]->gpu[2] matches with a leaner node (nodes are named node*)
#
# Note that there is currently no guarantee this will only match with a leaner node
# We will need a node-type property added to the property field of each
# node and augment our matching to do property matching. (Will create a ticket)
#
resource-query> match allocate linpack/spec.N1.n2.c18.g2.yaml
[{"rank": "-1", "node": "node0", "children": {"core": "0-35", "gpu": "0-3"}}]
INFO: =============================
INFO: JOBID=2
INFO: RESOURCES=ALLOCATED
INFO: SCHEDULED AT=Now
INFO: =============================
 
 
# node[2]->slot[1]->gpu[4] matches with one fatter node and one leaner node
# The reason that matched with the fatter node is the same as above
resource-query> match allocate linpack/spec.N2.n2.c18.g4.yaml
[{"rank": "-1", "node": "node1", "children": {"core": "0-17", "gpu": "0-3"}}, {"rank": "-1", "node": "dnode1", "children": {"core": "0-17", "gpu": "0-3"}}]
INFO: =============================
INFO: JOBID=3
INFO: RESOURCES=ALLOCATED
INFO: SCHEDULED AT=Now
INFO: =============================
 
 
# Now for running the linpack across the entire system
# slot[450]->gpu[4] doesn’t match because of the previous allocations
resource-query> match allocate linpack/spec.n450.c18.g4.yaml
INFO: =============================
INFO: No matching resources found
INFO: JOBID=4
INFO: =============================
 
 
# cancel all three previous jobs
resource-query> cancel 1
resource-query> cancel 2
resource-query> cancel 3
 
 
# Then, slot[450]->gpu[4] now matches with both compute node types and rlite is generated
resource-query> match allocate linpack/spec.n450.c18.g4.yaml
[{"rank": "-1", "node": "node0", "children": {"core": "0-17", "gpu": "0-3"}}, …
INFO: =============================
INFO: JOBID=5
INFO: RESOURCES=ALLOCATED
INFO: SCHEDULED AT=Now
INFO: =============================

May 05 '20 17:05 dongahn

GRUG used above. hetero_coarse.graphml.txt

May 05 '20 17:05 dongahn

PR #693 has expression evaluation support, which can be modeled for the solution for this issue.

Jul 12 '20 21:07 dongahn

This is being addressed by PR #922

May 19 '22 03:05 dongahn