Christopher Desiniotis
Christopher Desiniotis
@jcstryker Could you provide some more detail? Are there any driver logs you can provide? I brought up a disconnected environment with a repository mirror, but was not able to...
I would advise reading up on the device plugin framework, which should help you understand the motivation, use cases, advantages: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md > Containers have all they need to work with...
@neggert can you provide detail on the GPU pods running on the node and the workloads they are running? If they are actively using the GPU, we expect `k8s-driver-manager` to...
Thanks for the additional info. If all GPU pods are idle at the time of driver upgrade (there are no active GPU driver clients), then it makes sense why you...
@wawa0210 the logs you provided suggest that mig-manager successfully applied the `all-1g.10gb` configuration. What does `nvidia-smi` show? And when you describe the node, do you see the expected number of...
`No devices were found` typically indicates the driver failed to initialize. Can you collect system logs by running `dmesg | grep -i nvrm` on the host?
Yes you are right, we do have multiple components which have their own `k8s-driver-manager` configuration. This could be unified with some refactoring. If you are interested, contributions are welcome on...
> unable to mount the GPU due to an error related to the absence of library(s) found in "LIBDEVICE" folder. (LIBDEVICE folder is missing ) What does running `nvidia-smi` from...
> is NVIDIA planning to include this library in NVIDIA GPU operator installation ? No. The GPU Operator installs the NVIDIA GPU kernel driver, which consists of the kernel module...
@igorgad you do not need to manually mount `/dev/shm` in your pod spec. The device-plugin, as part of its AllocateResponse, will make sure all the entities required for MPS get...