microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

dqlite not listening on socket after update to 1.31.3

Open gsnsw-felixs opened this issue 1 year ago • 5 comments

Summary

A 3 node cluster has failed after auto-update to 1.31.3, the dqlite service starts but is not listening on /var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379

What Should Happen Instead?

dqlite service should start correctly or throw an error

Reproduction Steps

All 3 nodes in the cluster have this same issue. Another non-HA node seemed to update OK though.

Introspection Report

Sorry, can't post system details.

ubuntu@k8s-qa-001:~$ sudo systemctl status snap.microk8s.daemon-k8s-dqlite ● snap.microk8s.daemon-k8s-dqlite.service - Service for snap application microk8s.daemon-k8s-dqlite Loaded: loaded (/etc/systemd/system/snap.microk8s.daemon-k8s-dqlite.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2024-12-06 08:28:40 AEDT; 1h 21min ago Main PID: 767 (k8s-dqlite) Tasks: 18 (limit: 37663) Memory: 209.8M CGroup: /system.slice/snap.microk8s.daemon-k8s-dqlite.service └─767 /snap/microk8s/7449/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/7449/var/kubernetes/backend/ --listen=unix:///var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379

Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + '[' -e /var/snap/microk8s/7449/args/k8s-dqlite-env ']' Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + . /var/snap/microk8s/7449/args/k8s-dqlite-env Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + set +a Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[2086]: ++ cat /var/snap/microk8s/7449/args/k8s-dqlite Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + declare -a 'args=(--storage-dir=${SNAP_DATA}/var/kubernetes/backend/ Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: --listen=unix://${SNAP_DATA}/var/kubernetes/backend/kine.sock:12379)' Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + exec /snap/microk8s/7449/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/7449/var/kubernetes/backend/ --listen=unix:///var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379 Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Configure dqlite failure domain" failure-domain=1 Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Disable TLS ClientSessionCache" Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Enable TLS" min_tls_version=tls12 ubuntu@k8s-qa-001:~$ netstat -a --unix | grep kine.sock ubuntu@k8s-qa-001:~$

Can you suggest a fix?

Are you interested in contributing with a fix?

no

gsnsw-felixs avatar Dec 05 '24 23:12 gsnsw-felixs

Thank you for reporting this @gsnsw-felixs. When was this deployment setup? Was it tracking the 1.31 release? Was the first version deployed 1.31.0 or something else?

ktsakalozos avatar Dec 06 '24 06:12 ktsakalozos

Hello @gsnsw-felixs, would you be able to tell us which snap revision you've updated from?

louiseschmidtgen avatar Dec 06 '24 07:12 louiseschmidtgen

I'm pretty sure it had been set to:

tracking: 1.31/stable

So probably it had 1.31.2.

We removed and reinstalled the snap and got it running again on 1.31.3 BTW.

gsnsw-felixs avatar Dec 07 '24 00:12 gsnsw-felixs

It had been around a long time, so could have started as 1.22.

gsnsw-felixs avatar Dec 07 '24 00:12 gsnsw-felixs

Any solution to this?

mikhatanu avatar Jun 24 '25 06:06 mikhatanu