community icon indicating copy to clipboard operation
community copied to clipboard

Kata2.0 + Containerd using gpu in K8S 1.7

Open han2ni3bal opened this issue 3 years ago • 7 comments

Before raising this question

I am now in a cluster that originally used the runtime as docker, I switched the runtime of some nodes to containerd, and then used kata 2.0 for this part of the nodes. In this case, using Nvidia docker2.0 to start the container should be impossible, because it needs to specify the startup runtime as runc, and we need to specify it as kata, and Nvidia docker1.0 does not support k8s 1.8 as well. How can I make the kata2.0 container use Nvidia gpu normally?

han2ni3bal avatar Dec 29 '21 10:12 han2ni3bal

@han2ni3bal - You are correct that you cannot use docker with Kata 2.x (see https://github.com/kata-containers/kata-containers/issues/3417).

However, have you tried going through the steps in:

  • https://github.com/kata-containers/kata-containers/blob/main/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md

... and replacing:

$ docker run --device /dev/vfio/...

... with:

$ ctr run --device /dev/vfio/...

jodh-intel avatar Jan 12 '22 09:01 jodh-intel

@jodh-intel Really appreciate for your kindly help, we are going to try this way to use GPU pass-through mode with Kata2.0. What am I now concerned is that how to combine kata2.0 together with K8s devices plugin in our cluster. The solution now is that we can refer to Kubevirt devices plugin cause Kubevirt and Kata both use Vfio with IOMMU, but after checking the code of Kubevirt device plugin, I found that the device plugin finds gpu devices by the VFIO-PCI Driver, but kata does not use this. So I should figure it out that how to register gpu devices which are used by Kata2.0 into the cluster. Do you have some suggestions for this one? Thank you very much!

han2ni3bal avatar Jan 13 '22 01:01 han2ni3bal

@jodh-intel Really appreciate for your kindly help, we are going to try this way to use GPU pass-through mode with Kata2.0. What am I now concerned is that how to combine kata2.0 together with K8s devices plugin in our cluster. The solution now is that we can refer to Kubevirt devices plugin cause Kubevirt and Kata both use Vfio with IOMMU, but after checking the code of Kubevirt device plugin, I found that the device plugin finds gpu devices by the VFIO-PCI Driver, but kata does not use this. So I should figure it out that how to register gpu devices which are used by Kata2.0 into the cluster. Do you have some suggestions for this one? Thank you very much!

I also have the same question about how to combine kata2.0 together with K8s devices plugin in our cluster. Does the NVIDIA/k8s-device-plugin no longer work under this environment, so we need to develop a device plugin for Kata2.x? What about Kata1.x?

fighterhit avatar Jan 13 '22 02:01 fighterhit

Anyone in the community have GPU devices who can comment here?

Maybe @Jimmy-Xu or @flx42 have thoughts on this?

/cc @egernst, @dgibson.

jodh-intel avatar Jan 13 '22 09:01 jodh-intel

@fighterhit I think we can use Kubevirt device plugin with kata2.0, but I still need to test it.

han2ni3bal avatar Jan 24 '22 01:01 han2ni3bal

@fighterhit I think we can use Kubevirt device plugin with kata2.0, but I still need to test it.

Thank you for your advice @han2ni3bal , if you have any related progress, please let me know if you don't mind. 😀

fighterhit avatar Jan 25 '22 03:01 fighterhit

@fighterhit I think we can use Kubevirt device plugin with kata2.0, but I still need to test it.

Hi @han2ni3bal , have you tested the kubevirt gpu device plugin? When I use the latest version device plugin, the following error will be reported:

failed to create containerd task: failed to create shim: QMP command failed: The device is not writable: Permission denied: unknown 

My kata is v2.3.2 and containerd is v1.5.9.

fighterhit avatar Feb 22 '22 05:02 fighterhit