serving
serving copied to clipboard
Support for Resource Claims and DRA
Describe the feature
As K8s community is actively working on moving to resource claims it would be great to add support for it at some point (adding this for future reference). This is Alpha since 1.26, will stay so in 1.31 but it is being worked agressively see https://github.com/kubernetes/enhancements/issues/3063#issuecomment-1915852197 so soon it will move to Beta and GA it seems.
In Knative, right now trying to set a resource claim fails validation as expected:
Error from server (BadRequest): error when creating "service.yaml": admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.containers[0].resources.claims
/area API
References
Unleashing the Power of DRA (Dynamic Resource Allocation) for Just-in-Time GPU Slicing What Can I Get You? An Introduction to Dynamic Resource Allocation - Freddy Rolland & Adrian Chiris Deploy vLLM server on Kubernetes using NVIDIA Kubernetes DRA driver KCSEU 2024 - Dynamic Resource Allocation - the path towards GA - Kevin Klues Patrick Ohly Meeting notes from K8s Serving WG K8s issues/KEPs: Dynamic Resource Allocation with Control Plane Controller DRA: structured parameters