nri-plugins icon indicating copy to clipboard operation
nri-plugins copied to clipboard

[proto]: bolt a DRA driver frontend on the topology-aware policy.

Open klihub opened this issue 6 months ago • 0 comments

This prototype patch set bolts a DRA allocation frontend on top of the existing topology aware resource policy plugin. The main intention with of this patch set is

  • provide something practical to play around with for the feasibility study of enabling DRA-based CPU allocation,
  • allow (relatively) easy experimentation with how to expose CPU as DRA devices (IOW test various CPU DRA attributes)
  • allow testing how DRA-based CPU allocation (using non-trivial CEL expressions) would scale with cluster and cluster node size

Notes: This patched NRI plugin, especially in its current state and form, is not a proposal for a first real DRA-based CPU driver.

If you want to play around with this (for instance modify the exposed CPU abstraction), the easiest way is to

  1. fork the main NRI Reference Plugins repo
  2. enable github actions in your personal fork
  3. make any changes you want (for instance, to alter the CPU abstraction, take a look at cpu.DRA()
  4. Push your changes to ssh://[email protected]/$YOUR_FORK/nri-plugins/refs/heads/test/build/dra-driver.
  5. Wait for the image and Helm chart publishing actions to succeed
  6. Once done, you can pull the result in to your cluster with something like helm install --devel -n kube-system test oci://ghcr.io/$YOUR_GITHUB_USERID/nri-plugins/helm-charts/nri-resource-policy-topology-aware --version v0.9-dra-driver-unstable

You can then test if things work with something like

apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  name: any-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        deviceClassName: native.cpu
---
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  name: p-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        deviceClassName: native.cpu
        selectors:
          - cel:
              expression: device.attributes["native.cpu"].coreType == "P-core"
        count: 1
---
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  name: e-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        deviceClassName: native.cpu
        selectors:
          - cel:
              expression: device.attributes["native.cpu"].coreType == "E-core"
        count: 1
---
apiVersion: v1
kind: Pod
metadata:
  name: pcore-test
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: busybox
    imagePullPolicy: IfNotPresent
    args:
      - /bin/sh
      - -c
      - trap 'exit 0' TERM; sleep 3600 & wait
    resources:
      requests:
        cpu: 1
        memory: 100M
      limits:
        cpu: 1
        memory: 100M
      claims:
      - name: claim-pcores
  resourceClaims:
  - name: claim-pcores
    resourceClaimTemplateName: p-cores
  terminationGracePeriodSeconds: 1

klihub avatar Jun 14 '25 08:06 klihub