k3s
k3s copied to clipboard
[Release-1.27] - k3s etcd-snapshot save fails on host with IPv6 only
Backport fix for Fix on-demand snapshots on ipv6-only nodes
- #9214
Validated on Version:
-$ k3s version v1.27.11+k3s-11b31c28 (11b31c28)
Environment Details
Infrastructure Cloud EC2 instance
Node(s) CPU architecture, OS, and Version: SUSE Linux Enterprise Server 15 SP4
Cluster Configuration: 1 server node
Steps to validate the fix
- Install k3s with node ipv6 only with args on config, not CLI
k3s.io/node-args: '["server","--cluster-cidr","2001:cafe:42::/56","--service-cidr","2001:cafe:43::/108","--cluster-init","true","--node-ip","2600:1f1c:ab4:ee32:c44c:a8b3:4319:dad7","--write-kubeconfig-mode","644"]'
- Validate that etcd snapshot is working fine
- Validate nodes and pods
Reproduction Issue:
k3s -v
k3s version v1.29.1+k3s-8224a3a7 (8224a3a7)
go version go1.21.6``
kubectl get node -o yaml | grep node-args
k3s.io/node-args: '["server","--cluster-cidr","2001:cafe:42::/56","--service-cidr","2001:cafe:43::/108","--cluster-init","true","--node-ip","2600:1f1c:ab4:ee32:c44c:a8b3:4319:dad7","--write-kubeconfig-mode","644"]'
sudo k3s etcd-snapshot save
WARN[0000] Unknown flag --cluster-cidr found in config.yaml, skipping
WARN[0000] Unknown flag --service-cidr found in config.yaml, skipping
WARN[0000] Unknown flag --cluster-init found in config.yaml, skipping
WARN[0000] Unknown flag --write-kubeconfig-mode found in config.yaml, skipping
^C{"level":"warn","ts":"2024-02-15T19:39:34.996891Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00136e000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Canceled desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
{"level":"warn","ts":"2024-02-15T19:39:34.996862Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00136e000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Canceled desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
Validation Results:
$ k3s -v
k3s version v1.27.11+k3s-11b31c28 (11b31c28)
go version go1.21.7
$ kubectl get nodes,pods -A
NAME STATUS ROLES AGE VERSION
node/i Ready control-plane,etcd,master 27s v1.27.11+k3s-11b31c28
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-77ccd57875-ddmj4 1/1 Running 0 12s
kube-system pod/helm-install-traefik-crd-bf757 1/1 Running 0 12s
kube-system pod/helm-install-traefik-n9dsv 1/1 Running 0 12s
kube-system pod/local-path-provisioner-79ffd768b5-vrj6t 1/1 Running 0 12s
kube-system pod/metrics-server-648b5df564-vhmkf 0/1 Running 0 12s
kubectl get node -o yaml | grep node-args
k3s.io/node-args: '["server","--cluster-cidr","2001:cafe:42::/56","--service-cidr","2001:cafe:43::/108","--cluster-init","true","--node-ip","2600:1f1c:ab4:ee32:c44c:a8b3:4319:dad7","--write-kubeconfig-mode","644"]'
sudo k3s etcd-snapshot save
WARN[0000] Unknown flag --cluster-cidr found in config.yaml, skipping
WARN[0000] Unknown flag --service-cidr found in config.yaml, skipping
WARN[0000] Unknown flag --cluster-init found in config.yaml, skipping
WARN[0000] Unknown flag --node-ip found in config.yaml, skipping
WARN[0000] Unknown flag --write-kubeconfig-mode found in config.yaml, skipping
INFO[0000] Saving etcd snapshot to /var/lib/rancher/k3s/server/db/snapshots/on-demand-i-041ae49edb4c36e85-1708100498
{"level":"info","ts":"2024-02-16T16:21:37.859227Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-i-041ae49edb4c36e85-1708100498.part"}
{"level":"info","ts":"2024-02-16T16:21:37.861482Z","logger":"client","caller":"[email protected]/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2024-02-16T16:21:37.861588Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://[::1]:2379"}
{"level":"info","ts":"2024-02-16T16:21:37.926281Z","logger":"client","caller":"[email protected]/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2024-02-16T16:21:37.936413Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://[::1]:2379","size":"3.0 MB","took":"now"}
{"level":"info","ts":"2024-02-16T16:21:37.936512Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-i-041ae49edb4c36e85-1708100498"}
INFO[0000] Reconciling ETCDSnapshotFile resources
INFO[0000] Reconciliation of ETCDSnapshotFile resources complete
Working as expected using config but not with args on CLI, talking with @brandond we are letting this behind now to release the whole fix
Moving out to next release to extend fix to CLI args, not just config.
Validated on Version:
-$ k3s version v1.27.12+k3s-2d48b196 (2d48b196)
Environment Details
Infrastructure Cloud EC2 instance
Node(s) CPU architecture, OS, and Version: SUSE Linux Enterprise Server 15 SP4
Cluster Configuration:
- SPLIT ROLE
- 1 SERVER
- 2 CP ONLY
- 2 ETCD ONLY
- 2 WORKERS
Steps to validate the fix
- create a cluster
- take etcd snapshot
- validate new outputs
- Restore
- Validate restore
- Validate nodes
- Validate pods
Reproduction Issue:
Validation Results:
$ sudo k3s etcd-snapshot save --etcd-s3
FATA[0000] see server log for details: s3 bucket name was not set
$ sudo k3s etcd-snapshot save \
--s3 \
--s3-bucket=" " \
--s3-access-key=" " \
--s3-secret-key=" + + " \
--s3-region="us-east-2" \
--s3-timeout=90s
INFO[0002] Snapshot on-demand-ip-172-31-7-127.us-east-2.compute.internal-1713184771 saved.
$ sudo k3s server \
--cluster-reset \
--etcd-s3 \
--cluster-reset-restore-path=" " \
--etcd-s3-bucket=" " \
--etcd-s3-region=us-east-2 \
--etcd-s3-access-key=" " \
--etcd-s3-secret-key=" "
INFO[0014] Managed etcd cluster membership has been reset, restart without --cluster-reset flag now. Backup and delete ${datadir}/server/db on each peer etcd server and rejoin the nodes
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip- .us-east-2.compute.internal Ready etcd 45m v1.27.12+k3s-2d48b196
ip- .us-east-2.compute.internal Ready control-plane,master 44m v1.27.12+k3s-2d48b196
ip- .us-east-2.compute.internal Ready <none> 43m v1.27.12+k3s-2d48b196
ip- .us-east-2.compute.internal Ready <none> 43m v1.27.12+k3s-2d48b196
ip- .us-east-2.compute.internal Ready control-plane,master 44m v1.27.12+k3s-2d48b196
ip- .us-east-2.compute.internal Ready <none> 42m v1.27.12+k3s-2d48b196
ip- .us-east-2.compute.internal Ready etcd 44m v1.27.12+k3s-2d48b196
ip- .us-east-2.compute.internal Ready control-plane,etcd,master 47m v1.27.12+k3s-2d48b196
ip-1
Ready control-plane,etcd,master 18s v1.27.12+k3s-2d48b196
~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-77ccd57875-g69xw 1/1 Running 0 47m
kube-system helm-install-traefik-56hr2 0/1 Completed 1 47m
kube-system helm-install-traefik-crd-kbb4h 0/1 Completed 0 47m
kube-system local-path-provisioner-79ffd768b5-dpv4z 1/1 Running 0 47m
kube-system metrics-server-c44988498-ssdqv 1/1 Running 0 47m
kube-system svclb-traefik-737736ff-6vb2t 2/2 Running 0 43m
kube-system svclb-traefik-737736ff-d5bxl 2/2 Running 0 43m
kube-system svclb-traefik-737736ff-ff2jq 2/2 Running 0 47m
kube-system svclb-traefik-737736ff-n8qs7 2/2 Running 0 42m
kube-system svclb-traefik-737736ff-pr7pq 2/2 Running 0 44m
kube-system svclb-traefik-737736ff-wfpgq 2/2 Running 0 45m
kube-system traefik-7d5c94d587-4ns9b 1/1 Running 0 47m