dqlite not listening on socket after update to 1.31.3
Summary
A 3 node cluster has failed after auto-update to 1.31.3, the dqlite service starts but is not listening on /var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379
What Should Happen Instead?
dqlite service should start correctly or throw an error
Reproduction Steps
All 3 nodes in the cluster have this same issue. Another non-HA node seemed to update OK though.
Introspection Report
Sorry, can't post system details.
ubuntu@k8s-qa-001:~$ sudo systemctl status snap.microk8s.daemon-k8s-dqlite ● snap.microk8s.daemon-k8s-dqlite.service - Service for snap application microk8s.daemon-k8s-dqlite Loaded: loaded (/etc/systemd/system/snap.microk8s.daemon-k8s-dqlite.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2024-12-06 08:28:40 AEDT; 1h 21min ago Main PID: 767 (k8s-dqlite) Tasks: 18 (limit: 37663) Memory: 209.8M CGroup: /system.slice/snap.microk8s.daemon-k8s-dqlite.service └─767 /snap/microk8s/7449/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/7449/var/kubernetes/backend/ --listen=unix:///var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379
Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + '[' -e /var/snap/microk8s/7449/args/k8s-dqlite-env ']' Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + . /var/snap/microk8s/7449/args/k8s-dqlite-env Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + set +a Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[2086]: ++ cat /var/snap/microk8s/7449/args/k8s-dqlite Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + declare -a 'args=(--storage-dir=${SNAP_DATA}/var/kubernetes/backend/ Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: --listen=unix://${SNAP_DATA}/var/kubernetes/backend/kine.sock:12379)' Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + exec /snap/microk8s/7449/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/7449/var/kubernetes/backend/ --listen=unix:///var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379 Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Configure dqlite failure domain" failure-domain=1 Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Disable TLS ClientSessionCache" Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Enable TLS" min_tls_version=tls12 ubuntu@k8s-qa-001:~$ netstat -a --unix | grep kine.sock ubuntu@k8s-qa-001:~$
Can you suggest a fix?
Are you interested in contributing with a fix?
no
Thank you for reporting this @gsnsw-felixs. When was this deployment setup? Was it tracking the 1.31 release? Was the first version deployed 1.31.0 or something else?
Hello @gsnsw-felixs, would you be able to tell us which snap revision you've updated from?
I'm pretty sure it had been set to:
tracking: 1.31/stable
So probably it had 1.31.2.
We removed and reinstalled the snap and got it running again on 1.31.3 BTW.
It had been around a long time, so could have started as 1.22.
Any solution to this?