microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

When running microk8s enable gpu - Error: cannot re-use a name that is still in use

Open nroberts1 opened this issue 3 years ago • 3 comments

After enabling kubeflow I've enabled gpu but unfortunately it never seemed to have started the relevant pods. I tried again by disabling gpu and then reenabling gpu and not I just get:

**Enabling NVIDIA GPU

Addon dns is already enabled.

Addon helm3 is already enabled.

Installing NVIDIA Operator

WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config

"nvidia" already exists with the same configuration, skipping

WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config

Hang tight while we grab the latest from your chart repositories...

...Successfully got an update from the "nvidia" chart repository

Update Complete. ⎈Happy Helming!⎈

WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config

Error: cannot re-use a name that is still in use**

My attempts to search this error seems to point to Helm. To confirm I have not enabled Helm add-on although as the output above suggests the gpu addon adds helm (as you'll see in the inspect output)

inspection-report-20210603_091631.tar.gz Please run microk8s inspect and attach the generated tarball to this issue.

We appreciate your feedback. Thank you for using microk8s.

nroberts1 avatar Jun 03 '21 08:06 nroberts1

Okay I've managed to resolve the issue above by:

Helm3 addon was enabled by the gpu addon above microk8s helm3 ls displays a deployment of 'gpu-operator' delete this with - "microk8s helm3 uninstall gpu-operator --namespace default" Now running microk8s enable gpu results in: Enabling NVIDIA GPU [sudo] password for nathan: Addon dns is already enabled. Addon helm3 is already enabled. Installing NVIDIA Operator WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config "nvidia" already exists with the same configuration, skipping WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "nvidia" chart repository Update Complete. ⎈Happy Helming!⎈ WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /var/snap/microk8s/2210/credentials/client.config NAME: gpu-operator LAST DEPLOYED: Thu Jun 3 09:38:47 2021 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None NVIDIA is enabled

But I don't seem to get pod nvidia-device-plugin-daemonset-xxxx

My machine has nividia-smi output of: Thu Jun 3 09:46:25 2021

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   55C    P0    32W /  N/A |    836MiB /  7979MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2106      G   /usr/lib/xorg/Xorg                397MiB |
|    0   N/A  N/A      5649      G   /usr/bin/gnome-shell              128MiB |
|    0   N/A  N/A      8943      G   ...gAAAAAAAAA --shared-files      198MiB |
|    0   N/A  N/A      9530      G   ...AAAAAAAAA= --shared-files       64MiB |
|    0   N/A  N/A     10956      G   ...AAAAAAAAA= --shared-files       42MiB |
+-----------------------------------------------------------------------------+

What I'm really trying to do is create a notbook in the kubeflow dashboard with the gpu image: gcr.io/kubeflow-images-public/tensorflow-1.15.2-notebook-gpu:1.0.0

But when I select the number of GPUs option to 1 and GPU vendor to NVIDIA and click launch I get the error: "Notebook.kubeflow.org "mnist" is invalid: spec.template.spec.containers.resources.limits.nvidia.com/gpu: Invalid value: "integer": spec.template.spec.containers.resources.limits.nvidia.com/gpu in body must be of type string: "integer". Only by selecting 'None' can I get it to lauch but surely that won't use the GPU, or am I misunderstanding something?

nroberts1 avatar Jun 03 '21 08:06 nroberts1

+1

sebastianohl avatar May 20 '22 18:05 sebastianohl

Same for me, any help?

trongvanhpkt99 avatar Jun 14 '22 07:06 trongvanhpkt99

Im facing the same issue. Also i dont ses gpu-operator namespace:

devgpu@srarya:/etc/apt$ microk8s.kubectl get ns
NAME                     STATUS   AGE
container-registry       Active   3d21h
default                  Active   3d22h
gpu-operator-resources   Active   2d2h
kube-node-lease          Active   3d22h
kube-public              Active   3d22h
kube-system              Active   3d22h
olm                      Active   47h
operators                Active   2d
devgpu@srarya:/etc/apt$ microk8s.kubectl -n gpu-operator-resources get pods
NAME                                                          READY   STATUS    RESTARTS       AGE
gpu-operator-6688b48999-sb9l2                                 1/1     Running   2 (2d1h ago)   2d2h
gpu-operator-node-feature-discovery-master-59b4b67f4f-qm6hv   1/1     Running   2 (2d1h ago)   2d2h
gpu-operator-node-feature-discovery-worker-l2bwx              1/1     Running   3 (2d1h ago)   2d2h
devgpu@srarya:/etc/apt$

Any help on this is very much apprecaited.

Amithpn avatar Sep 30 '22 08:09 Amithpn

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 26 '23 09:08 stale[bot]