Shiva Krishna Merla
Shiva Krishna Merla
we will look into adding this with future releases.
@elezar to comment if this is supported by our container-toolkit.
@KodieGlosserIBM Can you overwrite container-toolkit version in ClusterPolicy with v1.6.0 (which seems to have worked with v1.9.1 installs) and confirm if this is resolved? ``` toolkit: repository: nvcr.io/nvidia/k8s version: 1.6.0-ubi8...
@KodieGlosserIBM For RHEL7 nodes this workaround is required when ClusterPolicy instance is created. For RHEL8 i believe this workaround is not required.
@KodieGlosserIBM since we officially don't claim support for RHEL7 the documentation doesn't exist for this. I will check with our PMs on how to handle this case. By the way...
@gchazot It can be done through `helm upgrade` as well with same version of the chart but by changing driver imagePullPolicy. Either approaches will result in same.
@fanminshi @jinwonkim93 weird that `imagePullPolicy=Always` didn't update the image. Can you double by describing the driver pod that new image is pulled?
@mastier toolkit validation doesn't use "chroot", but directly invokes `nvidia-smi` as we expect toolkit to inject these files automatically. Hence mount of `/run/nvidia/driver` is not required for this container. Code...
If you add the `debug` fields i mentioned earlier with toolkit config file, we should see those logs under "/var/log". Also, can you attach `/etc/containerd/config.toml` as well.
@aym-frikha after applying the change, can you delete gpu-operator-validator pod, so that when it tries to run again it will generate logs by toolkit. Also, please don't delete the toolkit...