Shiva Krishna Merla

Results 278 comments of Shiva Krishna Merla

@anoopsinghnegi i have updated the image, please re-pull and verify with `quay.io/shivamerla/driver:535.104.05-rhel8.6` and `quay.io/shivamerla/driver:535.104.05-rhel8.8`. MR here: https://gitlab.com/nvidia/container-images/driver/-/merge_requests/269

`failed to get sandbox runtime: no runtime for "nvidia" ` this is a very generic error that happens when the container-toolkit is not able to apply the runtime config successfully...

@Alwinator From the driver pod logs you posted, looks like driver install is successful. Can you exec into that container and run "nvidia-smi"? ``` oc exec -n nvidia-gpu-operator nvidia-driver-daemonset-411.86.202212072103-0-8k4vr --...

@Alwinator If `nvidia-smi` is successful then driver Daemonset will create the file `/run/nvidia/validations/.driver-ctr-ready` from the startup probe [here](https://github.com/NVIDIA/gpu-operator/blob/master/assets/state-driver/0500_daemonset.yaml#L131). Is it possible to double check if this status file got created...

From the logs attached to this issue earlier, looks like driver directory is mounted. May be when you were checking driver container restarted for some reason and unmounted /run/nvidia directory?...

@chiragjn Thanks for the detailed report. We are aware of this issue and something we plan to fix in the future. With v22.9.2 for driver upgrades, we are planing to...

@klueska does it make sense to introduce knobs(env/args) to control allocation logic during `GetPreferredAllocation` within the device plugin?

Thanks for reporting this @relyt0925. Working with RH to understand why the tag was missing from the imagestream. This version is picked from the NFD label `feature.node.kubernetes.io/system-os_release.OSTREE_VERSION` and we expect...

@mikehollinger yes, the workaround to tag manually seems reasonable until the root cause is identified. @fabiendupont Can you help to identify why the version mismatch here? May be NFD didn't...