Shiva Krishna Merla

Results 278 comments of Shiva Krishna Merla

@geoberle yes, now each component has a nodeSelector label as below. GPU operator will automatically add these labels to nodes with NVIDIA GPU's. The reason for this granularity is to...

@koflerm Unloading an existing driver is little involved when driver container restarts. We are automating this with next upcoming release. Meanwhile you will need to evict all other GPU operator...

@koflerm unfortunately we don't have a way to set this through GPU operator yet. You need to edit file `/usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml` manually on each GPU node.

> @shivamerla I have fixed this now by creating a modified version of this file as a config map and by mounting this in the container-toolkit daemonset at /etc/nvidia-container-runtime/config.toml. This...

@tnakajo With v1.9 we have added feature to avoid dependency on Cluster wide entitlements to install NVIDIA driver. On 4.9 and certain z-stream versions of 4.8, NFD adds a special...

@donovat If you have ClusterWide entitlements applied then you can ignore that warning message. What that error means is with updated OCP versions 4.9+ entitlements could be avoided. Regarding driver...

Can you describe the driver pod and see if those env are added by operator?

@donovat Looks like i found the issue, we are using hosts `/etc/os-release` to fetch the RHEL_VERSION and OPENSHIFT_VERSION fields, but looks like on RHEL nodes OPENSHIFT_VERSION is not available in...

Sorry, missed this, will try to recreate this.

@sebastianohl @EliasVansteenkiste did you install FM packages on the host directly and managing through systemctl? With GPU Operator we launch FM daemon through driver-container when NVSwitch devices are detected. Can...