Benjamin Le Rohellec

Results 6 comments of Benjamin Le Rohellec

On my IPv6 cluster, the readiness probe fails. > Readiness probe failed: Get "http://[_IPv6_]:19001/ready": dial tcp [_IPv6_]:19001: connect: connection refused # Test I ran the test from https://gateway.envoyproxy.io/latest/user/http-routing/ # Fix...

If in the daemonset _nvidia-device-plugin-daemonset_ you remove the env NVIDIA_MIG_MONITOR_DEVICES, you can switch to `privileged: false` and the _NVIDIA_VISIBLE_DEVICES_ works for me. I don’t have cards with MIG support. I’m...

> @Baenimyr the configuration are for nvidia device plugin? how do i make just one node to hide a subset of GPUs? With NVIDIA_VISIBLE_DEVICES and `privileged: false`, the configuration is...

You can try to add a _shutdown_ command to the [set-compute-mode](https://github.com/nebuly-ai/k8s-device-plugin/blob/1e18198df2e7fb57c0061b39daf93d7fc7429263/deployments/helm/nvidia-device-plugin/templates/daemonset.yml#L67) container. This container must wait and run `nvidia-smi -c 0` when it receives a SIGINT.

Have you seen this MR ? https://github.com/NVIDIA/k8s-device-plugin/pull/490 Maybe you can use the mps daemon from nvidia.

I have some templates for my personal server I can share with you : https://gitlab.com/carvel-kapps The syntax of ytt+kapp is easier than Helm. Country : France Usage scenario : personal...