microk8s
microk8s copied to clipboard
When running microk8s enable gpu - Error: cannot re-use a name that is still in use
After enabling kubeflow I've enabled gpu but unfortunately it never seemed to have started the relevant pods. I tried again by disabling gpu and then reenabling gpu and not I just get:
**Enabling NVIDIA GPU
Addon dns is already enabled.
Addon helm3 is already enabled.
Installing NVIDIA Operator
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config
"nvidia" already exists with the same configuration, skipping
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nvidia" chart repository
Update Complete. ⎈Happy Helming!⎈
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config
Error: cannot re-use a name that is still in use**
My attempts to search this error seems to point to Helm. To confirm I have not enabled Helm add-on although as the output above suggests the gpu addon adds helm (as you'll see in the inspect output)
inspection-report-20210603_091631.tar.gz
Please run microk8s inspect
and attach the generated tarball to this issue.
We appreciate your feedback. Thank you for using microk8s.
Okay I've managed to resolve the issue above by:
Helm3 addon was enabled by the gpu addon above microk8s helm3 ls displays a deployment of 'gpu-operator' delete this with - "microk8s helm3 uninstall gpu-operator --namespace default" Now running microk8s enable gpu results in: Enabling NVIDIA GPU [sudo] password for nathan: Addon dns is already enabled. Addon helm3 is already enabled. Installing NVIDIA Operator WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config "nvidia" already exists with the same configuration, skipping WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "nvidia" chart repository Update Complete. ⎈Happy Helming!⎈ WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config NAME: gpu-operator LAST DEPLOYED: Thu Jun 3 09:38:47 2021 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None NVIDIA is enabled
But I don't seem to get pod nvidia-device-plugin-daemonset-xxxx
My machine has nividia-smi output of: Thu Jun 3 09:46:25 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A |
| N/A 55C P0 32W / N/A | 836MiB / 7979MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2106 G /usr/lib/xorg/Xorg 397MiB |
| 0 N/A N/A 5649 G /usr/bin/gnome-shell 128MiB |
| 0 N/A N/A 8943 G ...gAAAAAAAAA --shared-files 198MiB |
| 0 N/A N/A 9530 G ...AAAAAAAAA= --shared-files 64MiB |
| 0 N/A N/A 10956 G ...AAAAAAAAA= --shared-files 42MiB |
+-----------------------------------------------------------------------------+
What I'm really trying to do is create a notbook in the kubeflow dashboard with the gpu image: gcr.io/kubeflow-images-public/tensorflow-1.15.2-notebook-gpu:1.0.0
But when I select the number of GPUs option to 1 and GPU vendor to NVIDIA and click launch I get the error: "Notebook.kubeflow.org "mnist" is invalid: spec.template.spec.containers.resources.limits.nvidia.com/gpu: Invalid value: "integer": spec.template.spec.containers.resources.limits.nvidia.com/gpu in body must be of type string: "integer". Only by selecting 'None' can I get it to lauch but surely that won't use the GPU, or am I misunderstanding something?
+1
Same for me, any help?
Im facing the same issue. Also i dont ses gpu-operator namespace:
devgpu@srarya:/etc/apt$ microk8s.kubectl get ns
NAME STATUS AGE
container-registry Active 3d21h
default Active 3d22h
gpu-operator-resources Active 2d2h
kube-node-lease Active 3d22h
kube-public Active 3d22h
kube-system Active 3d22h
olm Active 47h
operators Active 2d
devgpu@srarya:/etc/apt$ microk8s.kubectl -n gpu-operator-resources get pods
NAME READY STATUS RESTARTS AGE
gpu-operator-6688b48999-sb9l2 1/1 Running 2 (2d1h ago) 2d2h
gpu-operator-node-feature-discovery-master-59b4b67f4f-qm6hv 1/1 Running 2 (2d1h ago) 2d2h
gpu-operator-node-feature-discovery-worker-l2bwx 1/1 Running 3 (2d1h ago) 2d2h
devgpu@srarya:/etc/apt$
Any help on this is very much apprecaited.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.