Felix Heilmeyer
Felix Heilmeyer
Log file looks like this ``` time="2023-08-09T20:58:41Z" level=info msg="creating symlinks under /dev/char that correspond to NVIDIA character devices" time="2023-08-09T20:58:41Z" level=info msg="Error: error validating driver installation: error creating symlink creator: failed...
> @heilerich just to confirm in your case driver is pre-installed on the node with Flatcar Linux? No the driver is installed using the driver-container > we set the `driverRoot`...
I think we found the difference and it has nothing to do with the organization policies. We are using the `internal_load_balancer` option of the official TF module. https://github.com/edgelesssys/constellation/blob/a5a7cec11bb8acb9df753bcce0470346a5edd6ca/terraform/infrastructure/gcp/modules/internal_load_balancer/main.tf#L60 https://github.com/edgelesssys/constellation/blob/a5a7cec11bb8acb9df753bcce0470346a5edd6ca/terraform/infrastructure/gcp/modules/loadbalancer/main.tf#L58 So...
We are also occasionally experiencing this issue Related #1850 Probably fixed by #2092
I have been running the NVIDIA GPU Operator on flatcar in multiple clusters for some years now using it to manage the driver as well as the toolkit. The requirement...
I can confirm that a symlink in `/usr/bin/nvidia-smi` will work. We have been running a test system with this for a couple of weeks now. Currently, gpu-operator requires two filesystem...
> DISABLE_DEV_CHAR_SYMLINK_CREATION This happens during the driver validation phase of the `gpu-operator-validator` Pod. Possibly, the driver is not validated if the toolkit installation is disabled? The preinstalled toolkit scenario is...
The sysext above works well on our test system with gpu-operator and `driver/toolkit.enabled=false`. We had to cleanup some files on nodes that had a gpu-operator installed driver/toolkit before and run...