kubevirtci icon indicating copy to clipboard operation
kubevirtci copied to clipboard

k8s-1.30: Enable feature-gate of DRA alpha feature

Open victortoso opened this issue 1 year ago • 10 comments

What this PR does / why we need it:

This is related to ongoing projects to test Dynamic Resource Allocation (DRA) in KubeVirt:

https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/

I'm enabling it in k8s-1.30 as per ongoing PCI passthrough driver developed by Sibasish Behera as part of his GSoC 2024 project, see:

https://github.com/kubevirt/dra-pci-driver

Special notes for your reviewer:

This patch mimics what was being done by hand in the dra-pci-driver, described here

I validated that it worked by checking resourceclasses exist within this image.

When the Node is up:

toso@tapioca ~/s/k/kubevirt (main)> cluster-up/kubectl.sh get resourceclasses
selecting docker as container runtime
No resources found

After installing dra-pci-driver

toso@tapioca ~/s/k/kubevirt (main)> cluster-up/kubectl.sh get resourceclasses
selecting docker as container runtime
NAME              DRIVERNAME                 AGE
pci.kubevirt.io   pci.resource.kubevirt.io   4m23s

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR. Approvers are expected to review this list.

Release note:

Enable  Dynamic Resource Allocation (DRA) Alpha feature in k8s-1.30

victortoso avatar Aug 20 '24 19:08 victortoso

/cc @TheRealSibasishBehera @alicefr

The demo is not yet fully working as I don't see the actual emulated nvme devices as allocatable yet. So, marking as draft till I figure that out.

victortoso avatar Aug 20 '24 19:08 victortoso

The demo is not yet fully working as I don't see the actual emulated nvme devices as allocatable yet.

Seems to be an issue with the registration between dri-pci-driver and the kubelet. Not sure yet what's wrong :)

After cluster is up, I can see that most configs are correct, eg:

toso@tapioca ~/s/k/kubevirt (main)> ./cluster-up/ssh.sh node01 "sudo grep -ri \"DynamicResourceAllocation\" /etc/kubernetes/"
selecting docker as container runtime
/etc/kubernetes/manifests/kube-scheduler.yaml:    - --feature-gates=DynamicResourceAllocation=true
/etc/kubernetes/manifests/kube-controller-manager.yaml:    - --feature-gates=DynamicResourceAllocation=true
/etc/kubernetes/manifests/kube-apiserver.yaml:    - --feature-gates=DynamicResourceAllocation=true
/etc/kubernetes/kubeadm.conf:    feature-gates: DynamicResourceAllocation=true
/etc/kubernetes/kubeadm.conf:    feature-gates: DynamicResourceAllocation=true
/etc/kubernetes/kubeadm.conf:    feature-gates: DynamicResourceAllocation=true

It seems that kubelet's /etc/kubernetes/kubelet.conf is missing and same goes to what was documented, for path /var/lib/kubelet/config.yaml

toso@tapioca ~/s/k/kubevirt (main)> ./cluster-up/ssh.sh node01 "sudo grep -ri \"DynamicResourceAllocation\" /var/lib/kubelet"
selecting docker as container runtime

Bringing the dri-pci-driver up, as said in the first commet, shows the resourceclass just fine but registrations seems not to happen.

toso@tapioca ~/s/k/d/demo (main)> kubectl get resourceclasses
NAME              DRIVERNAME                 AGE
pci.kubevirt.io   pci.resource.kubevirt.io   6s

toso@tapioca ~/s/k/kubevirt (main)> k logs -v 10 --all-containers -n dra-pci-driver dra-pci-driver-kubeletplugin-49hxt -f
E0821 07:14:04.828142       1 nonblockinggrpcserver.go:125] "handling request failed" err="failed registration process: RegisterPlugin error -- no handler registered for plugin type: DRAPlugin at socket /var/lib/kubelet/plugins_registry/pci.resource.kubevirt.io.sock" logger="registrar" requestID=12 request="&RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- no handler registered for plugin type: DRAPlugin at socket /var/lib/kubelet/plugins_registry/pci.resource.kubevirt.io.sock,}"

Changing the kubelet's config and restarting it did not solve it. Registration shows no error anymore but still could not see the nvme resources in the node

[vagrant@node01 ~]$ sudo vi /etc/kubernetes/kubelet.conf
[vagrant@node01 ~]$ sudo vi /var/lib/kubelet/config.yaml
[vagrant@node01 ~]$ sudo systemctl daemon-reload
[vagrant@node01 ~]$ sudo systemctl restart kubelet.service
toso@tapioca ~/s/k/kubevirt (main)> k logs --max-log-requests 10 -v 10 --all-containers -n dra-pci-driver dra-pci-driver-kubeletplugin-w6szb -f
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:00.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:01.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:02.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:03.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:04.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:05.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:06.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:07.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:08.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:09.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:1f.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:1f.2
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:1f.3
I0821 07:18:57.603631       1 nonblockinggrpcserver.go:105] "GRPC server started" logger="dra"
I0821 07:18:57.603736       1 nonblockinggrpcserver.go:105] "GRPC server started" logger="registrar"

For this PR, I think kubelet's config needs to be set otherwise the registration will not work. I might be missing something else otherwise I'd see the resources in the node.

victortoso avatar Aug 21 '24 07:08 victortoso

Thanks @victortoso for working on this . As I have mentioned in the set up doc, we will need to change the kube-apiserver, kube-controller-manager, kube-scheduler and kubelet configuration

And before deploying the dra-pci-driver, we have to bind the nvme devices to vfio-driver . For now , there are also some permission issues with selinux, so while trying use sudo setenforce 0

To verify that the devices are allocatable , check the NAS crd kubectl describe nas node01 -n dra-pci-driver

TheRealSibasishBehera avatar Aug 21 '24 15:08 TheRealSibasishBehera

To verify that the devices are allocatable , check the NAS crd kubectl describe nas node01 -n dra-pci-driver

toso@tapioca ~/s/k/kubevirt (main) [1]> k describe nas node01 -n dra-pci-driver
Name:         node01
Namespace:    dra-pci-driver
Labels:       <none>
Annotations:  <none>
API Version:  nas.pci.resource.kubevirt.io/v1alpha1
Kind:         NodeAllocationState
Metadata:
  Creation Timestamp:  2024-08-21T15:36:34Z
  Generation:          4
  Owner References:
    API Version:     v1
    Kind:            Node
    Name:            node01
    UID:             d6f18aca-1bad-4235-b6f0-166251301266
  Resource Version:  4366
  UID:               c0c20bbb-e875-421c-ad48-1b7a0fc1abde
Spec:
  Allocatable Devices:
    Pci:
      Pci Address:    0000:00:07.0
      Resource Name:  devices.kubevirt.io/nvme
      Uuid:           98b11d5d-db05-4a2a-9f7d-7d5c2b5edce3
    Pci:
      Pci Address:    0000:00:08.0
      Resource Name:  devices.kubevirt.io/nvme
      Uuid:           1b95adf6-2c1e-41b0-b2a7-47fac8f03bca
Status:               Ready
Events:               <none>

victortoso avatar Aug 21 '24 15:08 victortoso

To verify that the devices are allocatable , check the NAS crd kubectl describe nas node01 -n dra-pci-driver

I was looking at the wrong place, with kubectl describe node node01 -n dra-pci-driver. Thanks! Now the kubelet's config was patched to include featureGate, so all bits should be in place for the kubevirtci k8-1.30 image.

victortoso avatar Aug 21 '24 15:08 victortoso

/approve

dhiller avatar Aug 21 '24 16:08 dhiller

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhiller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot avatar Aug 21 '24 16:08 kubevirt-bot

@victortoso do we want to have this always on? I know it is more work but it will be good to have an KUBEVIRTCI env variable which enable this when we spin the cluster. Although, I'm not sure how to do a configurable kubeadm.conf, maybe using the default one and adding on the fly the additional option and overwrite the configuration?

alicefr avatar Aug 22 '24 07:08 alicefr

@victortoso do we want to have this always on? I know it is more more but it will be good to have an KUBEVIRTCI env variable which enable this when we spin the cluster. Although, I'm not sure how to do a configurable kubeadm.conf, maybe using the default one and adding on the fly the additional option and overwrite the configuration?

Yeah you probably don't want all other lanes to test the k8s beast with DRA enabled, it might hurt the integrity of what we end up testing, with DRA being the huge feature it is

akalenyu avatar Aug 22 '24 08:08 akalenyu

/hold

alicefr avatar Aug 22 '24 08:08 alicefr

Sorry, looking at this again.

I'm not sure how to do a configurable kubeadm.conf, maybe using the default one and adding on the fly the additional option and overwrite the configuration?

I'm looking at what was done with KUBEVIRT_CPU_MANAGER_POLICY here and we might be able to do it in a similar way. Probably a small refactor too to make we do both changes once in case we want to enable DRA & CPU_MANAGER_POLICY.

Now, looking at the comment there it feels we can use drop-in replacement already. This should reduce the seds while adding a few extra kubeadm.conf files and running kubelet with a --config-dir parameter by default.

@lyarwood does it sound reasonable? any suggestions?

Cheers,

victortoso avatar Oct 04 '24 14:10 victortoso

Sorry, looking at this again.

I'm not sure how to do a configurable kubeadm.conf, maybe using the default one and adding on the fly the additional option and overwrite the configuration?

I'm looking at what was done with KUBEVIRT_CPU_MANAGER_POLICY here and we might be able to do it in a similar way. Probably a small refactor too to make we do both changes once in case we want to enable DRA & CPU_MANAGER_POLICY.

Now, looking at the comment there it feels we can use drop-in replacement already. This should reduce the seds while adding a few extra kubeadm.conf files and running kubelet with a --config-dir parameter by default.

@lyarwood does it sound reasonable? any suggestions?

Yes if we can get away with using drop-ins now we should do so. The KUBEVIRT_CPU_MANAGER_POLICY needs to get reworked anyway as it's duplicating the launch options we currently pass to kubelet during the provision step.

https://github.com/kubevirt/kubevirtci/blob/16d40d2a563a61e866cc409c0b2cf1ef66f3c4a4/cluster-provision/gocli/opts/nodes/nodes.go#L39

I thought I had captured this in a follow up issue but I can't seem to find it, I'll create one now.

lyarwood avatar Oct 07 '24 08:10 lyarwood

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Jan 05 '25 08:01 kubevirt-bot

Close in favor of https://github.com/kubevirt/kubevirtci/pull/1345

victortoso avatar Jan 14 '25 09:01 victortoso