Etcd storage appears to not be migrated from v2 to v3
I noticed my cluster not running and having issues with etcd service. I noticed it was related to enable-v2 flag as described in https://github.com/canonical/microk8s/issues/5209, I removed the --enable-v2=true flag from /var/snap/microk8s/8384/args/etcd file to match the fix from https://github.com/canonical/microk8s/pull/5212, however now I get the new error: illegal v2store content.
According to https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/, there should be a migration performed, I cannot find a trace of a migration command (ETCDCTL_API=3 etcdctl migrate) in microk8s repository, so perhaps we're missing the migration step?
Below are logs of the failed startup of the etcd from the /var/log/syslog
Sep 8 13:04:12 ckube-1 systemd[1]: Started Service for snap application microk8s.daemon-etcd.
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"warn","ts":"2025-09-08T13:04:12.556709Z","caller":"embed/config.go:1209","msg":"Running http and grpc server on single port. This is not recommended for production."}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"warn","ts":"2025-09-08T13:04:12.557227Z","caller":"embed/config.go:1320","msg":"it isn't recommended to use default name, please set a value for --name. Note that etcd might run into issue when multiple members have the same default name","name":"default"}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.557365Z","caller":"etcdmain/etcd.go:64","msg":"Running: ","args":["/snap/microk8s/8384/etcd","--data-dir=/var/snap/microk8s/common/var/run/etcd","--advertise-client-urls=https://192.168.5.70:12379","--listen-client-urls=https://0.0.0.0:12379","--client-cert-auth","--trusted-ca-file=/var/snap/microk8s/8384/certs/ca.crt","--cert-file=/var/snap/microk8s/8384/certs/server.crt","--key-file=/var/snap/microk8s/8384/certs/server.key"]}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.557580Z","caller":"etcdmain/etcd.go:107","msg":"server has already been initialized","data-dir":"/var/snap/microk8s/common/var/run/etcd","dir-type":"member"}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"warn","ts":"2025-09-08T13:04:12.557738Z","caller":"embed/config.go:1209","msg":"Running http and grpc server on single port. This is not recommended for production."}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"warn","ts":"2025-09-08T13:04:12.557857Z","caller":"embed/config.go:1320","msg":"it isn't recommended to use default name, please set a value for --name. Note that etcd might run into issue when multiple members have the same default name","name":"default"}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.557974Z","caller":"embed/etcd.go:138","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.558546Z","caller":"embed/etcd.go:146","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:12379"]}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.558843Z","caller":"embed/etcd.go:323","msg":"starting an etcd server","etcd-version":"3.6.4","git-sha":"5400cdc","go-version":"go1.23.11","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"default","data-dir":"/var/snap/microk8s/common/var/run/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/snap/microk8s/common/var/run/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://192.168.5.70:12379"],"listen-client-urls":["https://0.0.0.0:12379"],"listen-metrics-urls":[],"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"feature-gates":"","initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1,"v2-deprecation":"write-only"}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.559479Z","logger":"bbolt","caller":"backend/backend.go:203","msg":"Opening db file (/var/snap/microk8s/common/var/run/etcd/member/snap/db) with mode -rw------- and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0003d0118}"}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.577801Z","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/var/snap/microk8s/common/var/run/etcd/member/snap/db) successfully"}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.577867Z","caller":"storage/backend.go:80","msg":"opened backend db","path":"/var/snap/microk8s/common/var/run/etcd/member/snap/db","took":"18.489834ms"}
Sep 8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.577900Z","caller":"etcdserver/bootstrap.go:220","msg":"restore consistentIndex","index":491955990}
Sep 8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"error","ts":"2025-09-08T13:04:13.111288Z","caller":"etcdserver/bootstrap.go:409","msg":"illegal v2store content","error":"detected disallowed custom content in v2store for stage --v2-deprecation=write-only","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.recoverSnapshot\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:409\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrapBackend\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:225\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrap\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:80\ngo.etcd.io/etcd/server/v3/etcdserver.NewServer\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:307\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:262\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:207\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:114\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
Sep 8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"error","ts":"2025-09-08T13:04:13.118940Z","caller":"etcdserver/server.go:309","msg":"bootstrap failed","error":"detected disallowed custom content in v2store for stage --v2-deprecation=write-only","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:309\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:262\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:207\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:114\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
Sep 8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:13.119030Z","caller":"embed/etcd.go:426","msg":"closing etcd server","name":"default","data-dir":"/var/snap/microk8s/common/var/run/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://192.168.5.70:12379"]}
Sep 8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:13.119151Z","caller":"embed/etcd.go:428","msg":"closed etcd server","name":"default","data-dir":"/var/snap/microk8s/common/var/run/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://192.168.5.70:12379"]}
Sep 8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"fatal","ts":"2025-09-08T13:04:13.119188Z","caller":"etcdmain/etcd.go:183","msg":"discovery failed","error":"detected disallowed custom content in v2store for stage --v2-deprecation=write-only","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:183\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
Sep 8 13:04:13 ckube-1 systemd[1]: snap.microk8s.daemon-etcd.service: Main process exited, code=exited, status=1/FAILURE
Sep 8 13:04:13 ckube-1 systemd[1]: snap.microk8s.daemon-etcd.service: Failed with result 'exit-code'.
Sep 8 13:04:13 ckube-1 systemd[1]: snap.microk8s.daemon-etcd.service: Scheduled restart job, restart counter is at 1.
Sep 8 13:04:13 ckube-1 systemd[1]: Stopped Service for snap application microk8s.daemon-etcd.
Summary
The new version of etcd expects storage to be migrated off the v2 version, which seems to never be executed when upgrading existing microk8s deployments.
What Should Happen Instead?
microk8s should perform the etcd storage upgrade prior to removal of the v2 storage.
Reproduction Steps
I have 3 clusters, 2 of them are using etcd and are experiencing the same issue. The one using Dqlite is unaffected.
My snap reports
snap-id: EaXqgt1lyCaxKaQCU349mlodBkDCXRcg
tracking: latest/stable
refresh-date: today at 10:42 UTC
installed: v1.34.0 (8384) 183MB classic
My current content of the env file (/var/snap/microk8s/8384/args/etcd)
--data-dir=${SNAP_COMMON}/var/run/etcd
--advertise-client-urls=https://${DEFAULT_INTERFACE_IP_ADDR}:12379
--listen-client-urls=https://0.0.0.0:12379
--client-cert-auth
--trusted-ca-file=${SNAP_DATA}/certs/ca.crt
--cert-file=${SNAP_DATA}/certs/server.crt
--key-file=${SNAP_DATA}/certs/server.key
Introspection Report
Inspecting system
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-flanneld is running
FAIL: Service snap.microk8s.daemon-etcd is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-etcd
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy openSSL information to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy current linux distribution to the final report tarball
Copy asnycio usage and limits to the final report tarball
Copy inotify max_user_instances and max_user_watches to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
WARNING: Maximum number of inotify user instances is less than the recommended value of 1024.
Increase the limit with:
echo fs.inotify.max_user_instances=1024 | sudo tee -a /etc/sysctl.conf
sudo sysctl --system
WARNING: Maximum number of inotify user watches is less than the recommended value of 1048576.
Increase the limit with:
echo fs.inotify.max_user_watches=1048576 | sudo tee -a /etc/sysctl.conf
sudo sysctl --system
Building the report tarball
Report tarball is at /var/snap/microk8s/8384/inspection-report-20250908_132532.tar.gz
Can you suggest a fix?
It seems that at some point the migration from v2 to v3 storage should be performed, before we turn off the v2 storage.
See
- https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/
- https://etcd.io/docs/v3.4/op-guide/v2-migration/
Are you interested in contributing with a fix?
Yes, but would need some guidance on where to put the migration script.
See my issue #5209.
When running a cluster in non-ha mode, flannel is used with etcd as the data store. This default needs to be switched to https://microk8s.io/docs/change-cidr#flannel-with-kubernetes-store
I will be moving off microk8s due to this lack of testing and support of their own features.
I was able to get my clusters running for the time being, by reverting to 1.33
snap refresh microk8s --channel=1.33/stable --classic
It seems that etcdctl bundled with v1.33.0 (rev 8205) is still too new to perform migration per https://github.com/etcd-io/etcd/issues/14058
$ snap run --shell microk8s
$ ETCDCTL_API=3 $SNAP/etcdctl version
etcdctl version: 3.5.17
API version: 3.5
$ ETCDCTL_API=3 $SNAP/etcdctl migrate
Error: unknown command "migrate" for "etcdctl"
Run 'etcdctl --help' for usage.
Error: unknown command "migrate" for "etcdctl"
I resolved this issue on my lab cluster commenting line "#--enable-v2=true" of file /var/snap/microk8s/8384/args/etcd, my file bellow:
--data-dir=${SNAP_COMMON}/var/run/etcd
--advertise-client-urls=https://${DEFAULT_INTERFACE_IP_ADDR}:12379
--listen-client-urls=https://0.0.0.0:12379
--client-cert-auth
--trusted-ca-file=${SNAP_DATA}/certs/ca.crt
--cert-file=${SNAP_DATA}/certs/server.crt
--key-file=${SNAP_DATA}/certs/server.key
#--enable-v2=true
and then started service snap.microk8s.daemon-etcd and microk8s start successfully.
I tried checking for v2 keys manually and deleted couple entries that I've found
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key -X DELETE https://127.0.0.1:12379/v2/keys/coreos.com/network/config -s
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key -X DELETE 'https://127.0.0.1:12379/v2/keys/coreos.com/network/subnets?dir=true' -s
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key -X DELETE 'https://127.0.0.1:12379/v2/keys/coreos.com/network?dir=true' -s
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key -X DELETE 'https://127.0.0.1:12379/v2/keys/coreos.com?dir=true' -s
until I got
$ sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key https://127.0.0.1:12379/v2/keys -s | jq .
{
"action": "get",
"node": {
"dir": true
}
}
Still getting illegal v2store content error on 1.34
Hey @Srokap,
Thanks for reporting this issue. As far as I'm aware Microk8s and kubernetes has switched to v3 a while back and --enable-v2 flag was kept around for compatibility reasons for flannel. The /coreos.com keys are used by flannel. Do we know how old is this cluster and which version was initially deployed? For clusters 1.27+ there should not be any v2 data since flannel was also switched to use etcd v3 on 1.27.
We're currently working on a migration path to offer a better UX. This process seems like it can be a hit or miss and could be tricky, see https://github.com/etcd-io/etcd/discussions/20231 for current workarounds / approaches.
The main observation I have is that etcd needs to create an internal snapshot after the removal of v2 data, which can be achieved by appending --snapshot-count=1 to /var/snap/microk8s/current/args/etcd to reduce amount of entries required before creating a snapshot. After the addition of the flag etcd should be restarted and this flag can be removed once etcd has a successful startup.
I'll update here with 2 possible workaround scripts that will get rid of the v2 data and we'll look into integrating this right as part of our upgrade process.
@berkayoz Both of my clusters are around 4 years old, so the initial version was likely a 1.19, initially I was trying a HA deployment, so probably back then I swapped to etcd, pretty soon I opted to just run separate, 1-node clusters instead.
I did verify that flannel's entries are also present in v3 store, so mostly looking into wiping the v2 data somehow.
It seems that the best time to do that was when etcdctl still had the migration command, which would be pre 3.5 as I understand.
As for snapshots, I think there are 4 snapshots in my data dir currently. When I have a moment, I will try to change the snapshot setting and see how it goes. I'll post here if I have anything new.
Ok, the --snapshot-count=1 seems to have helped. Let me list the steps I took to recover the cluster:
Delete the old keys
snap run --shell microk8s
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key -X DELETE https://127.0.0.1:12379/v2/keys/coreos.com/network/config -s
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key -X DELETE 'https://127.0.0.1:12379/v2/keys/coreos.com/network/subnets?dir=true' -s
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key -X DELETE 'https://127.0.0.1:12379/v2/keys/coreos.com/network?dir=true' -s
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key -X DELETE 'https://127.0.0.1:12379/v2/keys/coreos.com?dir=true' -s
Use following command to confirm that there are no more v2 entries:
sudo curl --cacert $SNAP_DATA/certs/ca.crt --cert $SNAP_DATA/certs/server.crt --key $SNAP_DATA/certs/server.key https://127.0.0.1:12379/v2/keys -s
Should return
{"action":"get","node":{"dir":true}}
Update etcd parameters
snap run --shell microk8s
sudo nano $SNAP_DATA/args/etcd
- Remove
--enable-v2=trueflag - Add
--snapshot-count=1flag
Restart etcd
sudo microk8s.stop
sudo microk8s.start
You can observe snapshot files (*.snap) by running following command
sudo ls -al $SNAP_COMMON/var/run/etcd/member/snap
Snapshot files will start showing up rapidly, so need to stop the service fast.
sudo microk8s.stop
sudo nano $SNAP_DATA/args/etcd
Remove --snapshot-count=1 flag and then run
sudo microk8s.start
The list of snapshots should reduce a lot (I have only 5 remaining) and all snapshots from beyond the current day should be gone.
Update the snap
sudo snap refresh microk8s --channel=1.34/stable --classic
At this point, the cluster starts up.
50% success rate
On one of the clusters it worked flawlessly, but on the other, when I ran
tail -f /var/log/syslog | grep microk8s.daemon-etcd
I was getting spam of errors:
Sep 12 00:09:05 ckube-1 microk8s.daemon-etcd[570915]: {"level":"warn","ts":"2025-09-12T00:09:05.844984Z","caller":"etcdserver/server.go:2304","msg":"Failed to detect schema version","error":"missing confstate information"}
Sep 12 00:09:05 ckube-1 microk8s.daemon-etcd[570915]: {"level":"error","ts":"2025-09-12T00:09:05.845055Z","caller":"version/monitor.go:120","msg":"failed to update storage version","cluster-version":"3.6.0","error":"cannot detect storage schema version: missing confstate information","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/version.(*Monitor).UpdateStorageVersionIfNeeded\n\tgo.etcd.io/etcd/server/v3/etcdserver/version/monitor.go:120\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).monitorStorageVersion\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2345\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach.func1\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2526"}
Sep 12 00:09:09 ckube-1 microk8s.daemon-etcd[570915]: {"level":"warn","ts":"2025-09-12T00:09:09.846165Z","caller":"etcdserver/server.go:2304","msg":"Failed to detect schema version","error":"missing confstate information"}
And in the cluster I get pending pods getting following events:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1aa2c9cfa38d43f398aabee4c2621fc3efaea710c3109e56578f437e3011a8aa": plugin type="flannel" name="flannel-plugin" failed (add): failed to allocate for range 0: no IP addresses available in range set: 10.1.75.1-10.1.75.254
I also no longer can downgrade to 1.33, so will have to figure something out for that.
Read my earlier comment and my other issue, implement the following. I was able to get 1.34 to work without HA.
https://microk8s.io/docs/change-cidr#flannel-with-kubernetes-store
Hey folks,
Here's a workaround guide for getting rid of v2 data and upgrading to 1.34 https://gist.github.com/berkayoz/1a04ebcd95494c08f2f01836cb16dbf9
@Srokap thanks for trying the suggestion out. I've not encountered the issue you are facing with your other cluster so far. Is etcd still able serve requests there ? If it does we might try performing a snapshot save and restore.
I was able to make a snapshot with following command:
sudo $SNAP/etcdctl --endpoints=127.0.0.1:12379 --cacert=$SNAP_DATA/certs/ca.crt --cert=$SNAP_DATA/certs/server.crt --key=$SNAP_DATA/certs/server.key snapshot save etcd-snapshot-2025-09-12.db
I was looking at https://etcd.io/docs/v3.5/op-guide/recovery/
Grab the matching etcdutl binary
wget https://github.com/etcd-io/etcd/releases/download/v3.6.4/etcd-v3.6.4-linux-amd64.tar.gz
tar zxf etcd-v3.6.4-linux-amd64.tar.gz
I can inspect the snapshot
$ sudo etcd-v3.6.4-linux-amd64/etcdutl snapshot status etcd-snapshot-2025-09-12.db -w table
+---------+-----------+------------+------------+---------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE | VERSION |
+---------+-----------+------------+------------+---------+
| 35a60b8 | 354045849 | 4181 | 122 MB | |
+---------+-----------+------------+------------+---------+
sudo mv $SNAP_COMMON/var/run/etcd/ $SNAP_COMMON/var/run/etcd.2025-09-12
sudo etcd-v3.6.4-linux-amd64/etcdutl snapshot restore etcd-snapshot-2025-09-12.db --data-dir $SNAP_COMMON/var/run/etcd/
sudo microk8s.start
This however ends up with failed to update storage version errors again.
Looking at migration command in etcdutil also fails with the same error
$ sudo etcd-v3.6.4-linux-amd64/etcdutl migrate --data-dir $SNAP_COMMON/var/run/etcd/ --target-version 3.6
2025-09-12T22:05:18Z info bbolt backend/backend.go:203 Opening db file (/var/snap/microk8s/common/var/run/etcd/member/snap/db) with mode -rw------- and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: , ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc000384118}
2025-09-12T22:05:18Z info bbolt [email protected]/db.go:321 Opening bbolt db (/var/snap/microk8s/common/var/run/etcd/member/snap/db) successfully
2025-09-12T22:05:18Z error etcdutl/migrate_command.go:132 failed to detect storage version. Please make sure you are using data dir from etcd v3.5 and older
go.etcd.io/etcd/etcdutl/v3/etcdutl.migrateCommandFunc
go.etcd.io/etcd/etcdutl/v3/etcdutl/migrate_command.go:132
go.etcd.io/etcd/etcdutl/v3/etcdutl.NewMigrateCommand.func1
go.etcd.io/etcd/etcdutl/v3/etcdutl/migrate_command.go:44
github.com/spf13/cobra.(*Command).execute
github.com/spf13/[email protected]/command.go:989
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/[email protected]/command.go:1117
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/[email protected]/command.go:1041
main.Start
go.etcd.io/etcd/etcdutl/v3/ctl.go:54
main.main
go.etcd.io/etcd/etcdutl/v3/main.go:23
runtime.main
runtime/proc.go:272
Error: missing confstate information
Apparently the key detection fails here https://github.com/etcd-io/etcd/blob/release-3.6/server/storage/schema/confstate.go#L40 due to missing confState key and supposedly confState in backend is persisted since etcd v3.5.. Weird.
I also looked at older version of etcdutil:
wget https://github.com/etcd-io/etcd/releases/download/v3.5.22/etcd-v3.5.22-linux-amd64.tar.gz
tar zxf etcd-v3.5.22-linux-amd64.tar.gz
$ sudo etcd-v3.5.22-linux-amd64/etcdutl check v2store --data-dir $SNAP_COMMON/var/run/etcd/
No custom content found in v2store.
At this point I went back to 1.33 again and noticed in logs a bunch of networking errors
09-4605-8b21-808070b07667)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"8766e0a1b9db5453965f7f3f7e4dc1e9874ef20ca5df08e4aa0e27c812c3a44e\\\": plugin type=\\\"flannel\\\" name=\\\"flannel-plugin\\\" failed (add): failed to delegate add: failed to allocate for range 0: no IP addresses available in range set: 10.1.75.1-10.1.75.254\"" pod="gitlab/gitlab-minio-85f46dd84c-zfr29" podUID="ac8e1f51-b709-4605-8b21-808070b07667"
Sep 12 22:32:01 ckube-1 systemd[1]: run-netns-cni\x2dd5d58720\x2dbba6\x2d0c0c\x2d5c3d\x2dd95136b53fc4.mount: Deactivated successfully.
Sep 12 22:32:01 ckube-1 systemd[1]: var-snap-microk8s-common-run-containerd-io.containerd.grpc.v1.cri-sandboxes-8766e0a1b9db5453965f7f3f7e4dc1e9874ef20ca5df08e4aa0e27c812c3a44e-shm.mount: Deactivated successfully.
Sep 12 22:32:01 ckube-1 networkctl[1606356]: Interface "veth1b430527" not found.
Sep 12 22:32:01 ckube-1 networkd-dispatcher[774]: ERROR:Failed to get interface "veth1b430527" status: Command '['/usr/bin/networkctl', 'status', '--no-pager', '--no-legend', '--', 'veth1b430527']' returned non-zero exit status 1.
Sep 12 22:32:01 ckube-1 networkctl[1606359]: Interface "veth603bf0f1" not found.
Sep 12 22:32:01 ckube-1 systemd[1]: Cannot find unit for notify message of PID 1606356, ignoring.
Sep 12 22:32:01 ckube-1 systemd[1]: networkd-dispatcher.service: Got notification message from PID 1606359, but reception only permitted for main PID 774
Sep 12 22:32:01 ckube-1 networkd-dispatcher[774]: ERROR:Failed to get interface "veth603bf0f1" status: Command '['/usr/bin/networkctl', 'status', '--no-pager', '--no-legend', '--', 'veth603bf0f1']' returned non-zero exit status 1.
Reboot didn't help. Done fighting with it for today.
I have a bit of an update. Looks like my flanneld got really messed up and required a configuration wipe. Below are commands that should do the trick:
snap run --shell microk8s
sudo microk8s.stop
sudo rm $SNAP_DATA/args/cni-network/flannel.conflist
sudo rm $SNAP_COMMON/run/flannel/subnet.env
sudo microk8s.start
sudo $SNAP/etcdctl --endpoints=127.0.0.1:12379 --cacert=$SNAP_DATA/certs/ca.crt --cert=$SNAP_DATA/certs/server.crt --key=$SNAP_DATA/certs/server.key del /coreos.com/network/subnets --prefix
Earlier, I did try just deleting $SNAP_COMMON/run/flannel/subnet.env file and restarting a service, but that was NOT enough.
Key problem seems to have been a "revoked lease" in the flanneld logs, when running sudo journalctl -u snap.microk8s.daemon-flanneld -n 100 --no-pager i got
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: I1004 23:09:07.241349 2498807 registry.go:330] watchSubnets: got valid subnet event with revision 117766
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: E1004 23:09:07.241716 2498807 local_manager.go:402] Lease has been revoked. Shutting down daemon.
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: E1004 23:09:07.241760 2498807 main.go:481] CompleteLease execute error err: interrupted
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: I1004 23:09:07.241800 2498807 main.go:488] Waiting for all goroutines to exit
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: I1004 23:09:07.241996 2498807 main.go:499] Stopping shutdownHandler...
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: E1004 23:09:07.242199 2498807 subnet.go:135] could not watch leases: context canceled
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: I1004 23:09:07.242206 2498807 subnet.go:180] context canceled, close receiver chan
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: I1004 23:09:07.242224 2498807 vxlan_network.go:79] evts chan closed
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: I1004 23:09:07.242260 2498807 subnet.go:207] leaseWatchChan channel closed
Oct 04 23:09:07 ckube-1 microk8s.daemon-flanneld[2498807]: I1004 23:09:07.242312 2498807 main.go:491] Exiting cleanly...
Oct 04 23:09:07 ckube-1 systemd[1]: snap.microk8s.daemon-flanneld.service: Deactivated successfully.
Oct 04 23:09:07 ckube-1 systemd[1]: snap.microk8s.daemon-flanneld.service: Consumed 2.836s CPU time.
Oct 04 23:09:13 ckube-1 systemd[1]: Started Service for snap application microk8s.daemon-flanneld.
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508966]: {"level":"warn","ts":"2025-10-04T23:09:18.237020Z","caller":"flags/flag.go:94","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_API=3"}
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508966]: 1
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508972]: {"level":"warn","ts":"2025-10-04T23:09:18.255883Z","caller":"flags/flag.go:94","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_API=3"}
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508972]: OK
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.295591 2508552 main.go:213] CLI flags config: {etcdEndpoints:https://127.0.0.1:12379 etcdPrefix:/coreos.com/network etcdKeyfile:/var/snap/microk8s/8474/certs/server.key etcdCertfile:/var/snap/microk8s/8474/certs/server.crt etcdCAFile:/var/snap/microk8s/8474/certs/ca.crt etcdUsername: etcdPassword: version:false kubeSubnetMgr:false kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/var/snap/microk8s/common/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true blackholeRoute:false netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: W1004 23:09:18.295837 2508552 main.go:608] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.297331 2508552 main.go:239] Created subnet manager: Etcd Local Manager with Previous Subnet: 10.1.75.0/24
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.297352 2508552 main.go:242] Installing signal handlers
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.304286 2508552 main.go:519] Found network config - Backend type: vxlan
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.304336 2508552 match.go:211] Determining IP address of default interface
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.304834 2508552 match.go:264] Using interface with name eno1 and address 192.168.5.70
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.304860 2508552 match.go:286] Defaulting external address to interface address (192.168.5.70)
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.304917 2508552 vxlan.go:141] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.306345 2508552 local_manager.go:189] Found previously leased subnet (10.1.75.0/24), reusing
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.309343 2508552 local_manager.go:209] Allocated lease (ip: 10.1.75.0/24 ipv6: ::/0) to current node (192.168.5.70)
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.309662 2508552 main.go:375] Cleaning-up unused traffic manager rules
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.309686 2508552 nftables.go:278] Cleaning-up nftables rules...
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.459052 2508552 iptables.go:50] Starting flannel in iptables mode...
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: W1004 23:09:18.459578 2508552 main.go:608] no subnet found for key: FLANNEL_IPV6_NETWORK in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: W1004 23:09:18.459712 2508552 main.go:608] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.459783 2508552 iptables.go:110] Setting up masking rules
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.478694 2508552 iptables.go:211] Changing default FORWARD chain policy to ACCEPT
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.491703 2508552 main.go:463] Wrote subnet file to /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.491749 2508552 main.go:467] Running backend.
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.491809 2508552 vxlan_network.go:65] watching for new subnet leases
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.494856 2508552 local_manager.go:322] manager.WatchLease: sending reset results...
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.494935 2508552 registry.go:293] registry: watching subnets starting from rev 117789
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.495023 2508552 local_manager.go:399] Waiting for 22h59m58.999814707s to renew lease
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.511634 2508552 iptables.go:357] bootstrap done
Oct 04 23:09:18 ckube-1 microk8s.daemon-flanneld[2508552]: I1004 23:09:18.520357 2508552 iptables.go:357] bootstrap done
The key error is Lease has been revoked. Shutting down daemon.
After the wipe I can finally see pods starting up again. Healthy flanneld log below
Oct 04 23:17:37 ckube-1 systemd[1]: Started Service for snap application microk8s.daemon-flanneld.
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534574]: {"level":"warn","ts":"2025-10-04T23:17:42.294277Z","caller":"flags/flag.go:94","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_API=3"}
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534574]: 1
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534593]: {"level":"warn","ts":"2025-10-04T23:17:42.358150Z","caller":"flags/flag.go:94","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_API=3"}
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534593]: OK
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.430390 2534016 main.go:213] CLI flags config: {etcdEndpoints:https://127.0.0.1:12379 etcdPrefix:/coreos.com/network etcdKeyfile:/var/snap/microk8s/8474/certs/server.key etcdCertfile:/var/snap/microk8s/8474/certs/server.crt etcdCAFile:/var/snap/microk8s/8474/certs/ca.crt etcdUsername: etcdPassword: version:false kubeSubnetMgr:false kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/var/snap/microk8s/common/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true blackholeRoute:false netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: W1004 23:17:42.430978 2534016 main.go:573] no subnet found for key: FLANNEL_SUBNET in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: W1004 23:17:42.431130 2534016 main.go:608] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.434651 2534016 main.go:239] Created subnet manager: Etcd Local Manager with Previous Subnet: None
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.434875 2534016 main.go:242] Installing signal handlers
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.445371 2534016 main.go:519] Found network config - Backend type: vxlan
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.445567 2534016 match.go:211] Determining IP address of default interface
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.446076 2534016 match.go:264] Using interface with name eno1 and address 192.168.5.70
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.446102 2534016 match.go:286] Defaulting external address to interface address (192.168.5.70)
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.446211 2534016 vxlan.go:141] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.448472 2534016 local_manager.go:226] Picking subnet in range 10.1.1.0 ... 10.1.255.0
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.452787 2534016 local_manager.go:209] Allocated lease (ip: 10.1.74.0/24 ipv6: ::/0) to current node (192.168.5.70)
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.455273 2534016 iface.go:282] removed IP address 10.1.75.0/32 flannel.1 from flannel.1
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.457350 2534016 main.go:375] Cleaning-up unused traffic manager rules
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.457393 2534016 nftables.go:278] Cleaning-up nftables rules...
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.589840 2534016 iptables.go:50] Starting flannel in iptables mode...
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: W1004 23:17:42.589875 2534016 main.go:573] no subnet found for key: FLANNEL_NETWORK in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: W1004 23:17:42.589887 2534016 main.go:573] no subnet found for key: FLANNEL_SUBNET in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: W1004 23:17:42.589900 2534016 main.go:608] no subnet found for key: FLANNEL_IPV6_NETWORK in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: W1004 23:17:42.589910 2534016 main.go:608] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.589919 2534016 iptables.go:100] Current network or subnet (10.1.0.0/16, 10.1.74.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.618141 2534016 iptables.go:110] Setting up masking rules
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.624077 2534016 iptables.go:211] Changing default FORWARD chain policy to ACCEPT
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.629313 2534016 main.go:463] Wrote subnet file to /var/snap/microk8s/common/run/flannel/subnet.env
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.629341 2534016 main.go:467] Running backend.
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.629505 2534016 vxlan_network.go:65] watching for new subnet leases
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.631549 2534016 registry.go:293] registry: watching subnets starting from rev 118442
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.631703 2534016 local_manager.go:322] manager.WatchLease: sending reset results...
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.631737 2534016 local_manager.go:399] Waiting for 22h59m58.999958094s to renew lease
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.644620 2534016 iptables.go:357] bootstrap done
Oct 04 23:17:42 ckube-1 microk8s.daemon-flanneld[2534016]: I1004 23:17:42.655295 2534016 iptables.go:357] bootstrap done
As for cluster recovery, I did end up wiping my etcd, but before that I managed to successfully run https://github.com/WoozyMasta/kube-dump against my busted cluster and got all the api data in form of yaml files with: ./kube-dump all.
I might try to restore a backup again to see if it was flannel messing with me all this time, but not today.
Oddly, I had another cluster start having flannel problems out of the blue (I was tweaking certificate IPs and restarting it) and I needed to run similarly:
snap run --shell microk8s
sudo $SNAP/etcdctl --endpoints=127.0.0.1:12379 --cacert=$SNAP_DATA/certs/ca.crt --cert=$SNAP_DATA/certs/server.crt --key=$SNAP_DATA/certs/server.key del /coreos.com/network/subnets --prefix
sudo microk8s.stop
sudo rm $SNAP_DATA/args/cni-network/flannel.conflist
sudo rm $SNAP_COMMON/run/flannel/subnet.env
sudo microk8s.start
Somewhat suspicious that in both cases it was necessary to delete etcd key.