DRA: Explicit DeviceClass Fields
Enhancement Description
Currently, a DeviceClass could be satisfied by more than one DRA driver. The only link between a DeviceClass and a driver is in the CEL selectors of the DeviceClass. While this is very flexible, it does not allow the scheduler to make assumptions that a given class is satsifiable by only one driver. The idea here is to allow driver authors and administrators to provide explicit guidance to the scheduler so it can reason more efficiently while searching the set of devices to satisfy a request.
This could also be helpful for deciding which resource pools should be visible in commands such as kubectl describe node.
- One-line enhancement description (can be used as a release note): DeviceClass now contains optional fields that allow administrators to let the scheduler know the driver and whether they are node local devices, leading to more efficient scheduling.
- Kubernetes Enhancement Proposal: TBD
- Discussion Link:
- https://github.com/kubernetes/kubernetes/issues/134986
- https://github.com/kubernetes/enhancements/issues/5491#issuecomment-3474111071
- PRs by stage and milestone:
- [ ] Alpha - v1.xx
- [ ] KEP (
k/enhancements) update PR(s): - [ ] Code (
k/k) update PR(s): - [ ] Docs (
k/website) update PR(s):
- [ ] KEP (
- [ ] Alpha - v1.xx
Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.
/wg device-management /sig scheduling cc @pohly @klueska @bwsalmon @erictune
cc @mortent
I agree that this is a useful optimization. Hindsight, etc. ...
My thinking here is related to how we evaluate requests. Since every subrequest has exactly one DeviceClass, I am wondering if we can use the global device class cache to index into the specific resource pools that can possibly satisfy a given subrequest. If so, this might allow us to make faster scheduling decisions (this would need to be proven).
Similarly, when we allocate resourceclaims at the workload level as part of workload aware scheduling, we want to aggregate resources in a topology domain, and knowing that a given device class is node local allows us to aggregate by device class. For that, things get tricky if the deviceclass is not independent - for example, if it uses partitionable logic, changes in counts of one device class can result in reductions of other device classes. This is tricky when considering multiple requests that utilize those overlapping deviceclasses in the same Pod. But other than that, this aggregation could be useful in those initial fit/pruning calculations to shortcut some searching of the solution space. See https://docs.google.com/document/d/1Fg9ughIRMtt1HmDqiGWV-w9OKdrcKf_PsH4TjuP8Y40/edit?tab=t.0#bookmark=id.ll792o1mpgt8
cc @dom4ha
I agree that this is a useful optimization. Hindsight, etc. ...
For sure. But even if they are optional fields, if people usually set them, we can get most of the benefit from these "in practice" even if in theory someone can still bypass them.
/cc