Shiva Krishna Merla

Results 278 comments of Shiva Krishna Merla

@hbahadorzadeh can you try out `v1.7.0` and check if the validation pod still crashing?

@rhysjtevans I think we had verified with `460.32.03` with CentOS8. I will look into details of compilation errors soon.

@rhysjtevans The comment [here](https://github.com/NVIDIA/libnvidia-container/blob/master/src/nvc_container.c#L348) explains the reason why `.real` file is preferred. But if this is not present libnvidia-container should fall back to using `/sbin/ldconfig` file. @elezar to confirm why...

@smithbk by default gpu-operator pod deployed through OLM doesn't have any specific nodeSelector/tolerations. Did you add the nodeSelector by editing the CSV? Can you get the podSpec for operator Deployment...

@smithbk Can you get the taints on the GPU nodes? We probably need to add those tolerations to the GPU Operator pod. This is done by editing CSV. `oc get...

@smithbk Can you describe the project `oc describe project nvidia-gpu-operator` to see if annotation was added to pick the nodeSelector? https://docs.openshift.com/container-platform/4.10/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-projects_nodes-scheduler-taints-tolerations

@smithbk That should be the pod IP which kubelet uses to probe for readiness/liveness. What does logs of gpu-operator show? Is ClusterPolicy status `ready`?

@smithbk Can you attach complete gpu-operator pod logs? It should be the kubernetes service-ip and reachable. `oc get service kubernetes -n default`.

@smithbk We would need to understand more on how the cluster is setup. This error should not be isolated to GPU Operator but any pod which is trying to access...