mayastor icon indicating copy to clipboard operation
mayastor copied to clipboard

mayastor-csi node-driver-registrar Registration process failed on aws eks

Open omarmakni opened this issue 2 years ago • 7 comments

Hello. When I follow instructions for mayastor on aws cluster (EKS) the mayastor-csi node-driver-registrar always fails with the following log: I0608 14:48:02.757010 1 main.go:113] Version: v2.1.0-0-g80d42f24 I0608 14:48:02.757918 1 connection.go:153] Connecting to unix:///csi/csi.sock I0608 14:48:02.852446 1 node_register.go:52] Starting Registration Server at: /registration/io.openebs.csi-mayastor-reg.sock I0608 14:48:02.852593 1 node_register.go:61] Registration Server started at: /registration/io.openebs.csi-mayastor-reg.sock I0608 14:48:02.852650 1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: "" I0608 14:48:04.226354 1 main.go:80] Received GetInfo call: &InfoRequest{} I0608 14:48:04.589035 1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "kubernetes.io/hostname":"ip-X-X-X-X" but existing label is "kubernetes.io/hostname":"ip-X-X-X-X.us-east-1.compute.internal",} E0608 14:48:04.589073 1 main.go:92] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "kubernetes.io/hostname":"ip-X-X-X-X" but existing label is "kubernetes.io/hostname":"ip-X-X-X-X.us-east-1.compute.internal", restarting registration container.

omarmakni avatar Jun 09 '22 10:06 omarmakni

Hi, we have made a fix for this and the change is currently on release/1.0.2. Can you please try with it?. You can use the mayastor-daemonset yaml from the deploy folder.

FYI: You would need to pass the hostname as nodename while pool creation as with the - "--node-name=$(MY_NODE_NAME)" flag removed the mayastor registers itself with the hostname.

Abhinandan-Purkait avatar Jun 14 '22 06:06 Abhinandan-Purkait

The official release of version v1.0.2, accompanied by the availability of images provided by the project maintainers, is expected within the next 30 days.

GlennBullingham avatar Jun 23 '22 22:06 GlennBullingham

I have a similar issue with the error from csi pods:

mayastor-csi-6pg5v                1/2     CrashLoopBackOff   6 (2m51s ago)   8m48s
mayastor-csi-gfp4r                1/2     CrashLoopBackOff   6 (3m1s ago)    8m47s
mayastor-csi-h4ttr                1/2     CrashLoopBackOff   6 (2m37s ago)   8m47s

logs from csi-driver-registrar container:

I0704 02:01:58.054673       1 main.go:113] Version: v2.1.0-0-g80d42f24
I0704 02:01:58.055231       1 connection.go:153] Connecting to unix:///csi/csi.sock
I0704 02:01:58.056390       1 node_register.go:52] Starting Registration Server at: /registration/io.openebs.csi-mayastor-reg.sock
I0704 02:01:58.056508       1 node_register.go:61] Registration Server started at: /registration/io.openebs.csi-mayastor-reg.sock
I0704 02:01:58.056594       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0704 02:01:59.463465       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0704 02:01:59.822520       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "kubernetes.io/hostname":"95" but existing label is "kubernetes.io/hostname":"95.xxx.xx.xxx",}
E0703 18:09:13.292330       1 main.go:92] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "kubernetes.io/hostname":"95" but existing label is "kubernetes.io/hostname":"95.xxx.xx.xxx", restarting registration container.
the 95.xxx.xx.xxx is the hostname of that node.

where 95.xxx.xx.xxx is the hostname of that node: kubernetes.io/hostname=95.xxx.xx.xxx

It seems somehow the CSI driver reported the hostname as only the first portion of the full hostname (95).

Then I tried manifests of release/1.0.2 , for both mayastor and mayastor-control-plane but got the same error.

kubernetes version: 1.23 rke2

tz-torchai avatar Jul 04 '22 02:07 tz-torchai

I'm having the same issue as @tz-torchai above. I'm following the instructions available at https://mayastor.gitbook.io/introduction/quickstart/deploy-mayastor and installed csi-daemonset with the following command.

kubectl apply -f https://raw.githubusercontent.com/openebs/mayastor/master/deploy/csi-daemonset.yaml
> kubectl -n mayastor get pods -w
NAME                 READY   STATUS             RESTARTS        AGE
mayastor-csi-2pxr2   1/2     CrashLoopBackOff   6 (2m21s ago)   12m
mayastor-csi-gklm8   1/2     CrashLoopBackOff   6 (2m21s ago)   12m
mayastor-csi-mgtt5   1/2     CrashLoopBackOff   6 (4m34s ago)   12m
mayastor-csi-sxhtn   1/2     CrashLoopBackOff   6 (3m30s ago)   12m
mayastor-etcd-0      1/1     Running            0               13m
mayastor-etcd-1      1/1     Running            0               13m
mayastor-etcd-2      1/1     Running            0               13m
nats-0               2/2     Running            0               15m
nats-1               2/2     Running            0               14m
nats-2               2/2     Running            0               14m
> kubectl -n mayastor logs mayastor-csi-2pxr2 -c csi-driver-registrar
I0816 20:34:51.285641       1 main.go:113] Version: v2.1.0-0-g80d42f24
I0816 20:34:51.286084       1 connection.go:153] Connecting to unix:///csi/csi.sock
I0816 20:34:51.287162       1 node_register.go:52] Starting Registration Server at: /registration/io.openebs.csi-mayastor-reg.sock
I0816 20:34:51.287276       1 node_register.go:61] Registration Server started at: /registration/io.openebs.csi-mayastor-reg.sock
I0816 20:34:51.287297       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0816 20:34:52.735125       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0816 20:34:53.120028       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "kubernetes.io/hostname":"10" but existing label is "kubernetes.io/hostname":"10.41.3.92",}
E0816 20:34:53.120079       1 main.go:92] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "kubernetes.io/hostname":"10" but existing label is "kubernetes.io/hostname":"10.41.3.92", restarting registration container.

acceleratxr avatar Aug 16 '22 20:08 acceleratxr

This is broken on 1.0.2 AFAICT, we strip out the subdomain, which doesn't really work when your hostname is an ip address. We've made changes to the way we handle node-names on the latest code-base but it hasn't been released yet.

You can have a peek at the develop by using this helmchart, but please be aware this is not compatible with the latest release, and will likely incur breaking changes until it reaches a new release, so not advised for production.

I've made a sneaky test-image compatible with the 1.0.2 if you want to quickly see if not splitting the hostname works, though it might likely fail elsewhere, as I've not tested this at all, use at your peril :) mayadata/mayastor-csi:2a4f05e0b37b

tiagolobocastro avatar Aug 16 '22 22:08 tiagolobocastro

This is broken on 1.0.2 AFAICT, we strip out the subdomain, which doesn't really work when your hostname is an ip address. We've made changes to the way we handle node-names on the latest code-base but it hasn't been released yet.

You can have a peek at the develop by using this helmchart, but please be aware this is not compatible with the latest release, and will likely incur breaking changes until it reaches a new release, so not advised for production.

I've made a sneaky test-image compatible with the 1.0.2 if you want to quickly see if not splitting the hostname works, though it might likely fail elsewhere, as I've not tested this at all, use at your peril :) mayadata/mayastor-csi:2a4f05e0b37b

Your hack seems to be holding so far...

NAME                 READY   STATUS    RESTARTS        AGE
mayastor-csi-6njnh   2/2     Running   0               2m55s
mayastor-csi-jpk66   2/2     Running   0               2m55s
mayastor-csi-nn7vn   2/2     Running   0               2m55s
mayastor-csi-rnrxl   2/2     Running   0               2m55s
mayastor-etcd-0      1/1     Running   1 (127m ago)    131m
mayastor-etcd-1      1/1     Running   1 (127m ago)    131m
mayastor-etcd-2      0/1     Running   24 (7m4s ago)   131m
nats-0               2/2     Running   0               132m
nats-1               2/2     Running   0               132m
nats-2               2/2     Running   0               131m

acceleratxr avatar Aug 16 '22 23:08 acceleratxr

This is broken on 1.0.2 AFAICT, we strip out the subdomain, which doesn't really work when your hostname is an ip address.

Well it certainly is broken for all hostnames with a period in them, so the typical FQDN hostname is affected, too.

zrav avatar Aug 17 '22 06:08 zrav

v1.0.4 this still fails on eks where node name looks like ip-xxx-xx-x-x.eu-central-1.compute.internal

rhrytskiv avatar Dec 23 '22 15:12 rhrytskiv

looks like https://github.com/kubernetes-csi/node-driver-registrar/issues/205 but I think its because mayastore is sending it the wrong hostname?

robjcaskey avatar Dec 23 '22 16:12 robjcaskey

@rhrytskiv I had the same issue yesterday and then tested in develop and it worked fine so it has been addressed but not backported.

robjcaskey avatar Dec 24 '22 13:12 robjcaskey

Hi, I think this has been fixed on develop but not back ported. To test this could you please add - "-N$(MY_NODE_NAME)" to the mayastor-daemonset arguments and see if it works? Thanks

tiagolobocastro avatar Dec 25 '22 10:12 tiagolobocastro