system-upgrade-controller
system-upgrade-controller copied to clipboard
K3s Upgrade Plan, 5 of 6 nodes upgraded, one node was terminated with badrequest
Version v0.9.1
Platform/Architecture pi@dairy:~ $ uname -a Linux dairy 5.15.32-v8+ #1538 SMP PREEMPT Thu Mar 31 19:40:39 BST 2022 aarch64 GNU/Linux
Describe the bug 5 out of 6 nodes upgraded to latest K3S successfully, one node was terminated with BadRequest
To Reproduce Run the OOB plan to upgrade to latest K3S. I also saw all 3 server nodes were upgraded first then the worker nodes were then processed but one failed.
Expected behavior All 6 nodes were upgraded successfully
Actual behavior 5 of 6 were upgraded
Additional context
Davids-iMac:K3s$ k get nodes
NAME STATUS ROLES AGE VERSION
dairy Ready,SchedulingDisabled
NAME COMPLETIONS DURATION AGE job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 0/1 15h 15h
Davids-iMac:K3s$ k describe job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 -n system-upgrade
Name: apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793
Namespace: system-upgrade
Selector: controller-uid=728cbed1-ce28-46ef-be2c-8a640164a121
Labels: objectset.rio.cattle.io/hash=80f52c1aa7257a6b5bd08982446fceff8c1a2394
plan.upgrade.cattle.io/k3s-agent=f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc
upgrade.cattle.io/controller=system-upgrade-controller
upgrade.cattle.io/node=dairy
upgrade.cattle.io/plan=k3s-agent
upgrade.cattle.io/version=v1.23.5-k3s1
Annotations: batch.kubernetes.io/job-tracking:
objectset.rio.cattle.io/applied:
H4sIAAAAAAAA/+xY227bOBN+lf/nteTKiXOQgb3wxu7WaOMYddpFUQQBTY5srilSS47sGIbffTGUfGoSN+3uRS+CALFIcQ6c+b7hUCuWA3LJkbP2inFjLHJU1nga2vFfINADNpyyDc...
objectset.rio.cattle.io/id: system-upgrade-controller
objectset.rio.cattle.io/owner-gvk: upgrade.cattle.io/v1, Kind=Plan
objectset.rio.cattle.io/owner-name: k3s-agent
objectset.rio.cattle.io/owner-namespace: system-upgrade
upgrade.cattle.io/ttl-seconds-after-finished: 900
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Sat, 23 Apr 2022 17:34:53 -0400
Active Deadline Seconds: 900s
Pods Statuses: 1 Active / 0 Succeeded / 14 Failed
Pod Template:
Labels: controller-uid=728cbed1-ce28-46ef-be2c-8a640164a121
job-name=apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793
plan.upgrade.cattle.io/k3s-agent=f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc
upgrade.cattle.io/controller=system-upgrade-controller
upgrade.cattle.io/node=dairy
upgrade.cattle.io/plan=k3s-agent
upgrade.cattle.io/version=v1.23.5-k3s1
Service Account: system-upgrade
Init Containers:
prepare:
Image: rancher/k3s-upgrade:v1.23.5-k3s1
Port:
Davids-iMac:K3s$ k logs job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 -n system-upgrade Found 15 pods, using pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-2sbvm Error from server (BadRequest): container "upgrade" in pod "apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-2sbvm" is terminated
Davids-iMac:K3s$ k logs pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r6dvh -n system-upgrade Error from server (BadRequest): container "upgrade" in pod "apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r6dvh" is waiting to start: PodInitializing
Plan:
These plans are adapted from work by Dax McDonald (https://github.com/daxmc99) and Hussein Galal (https://github.com/galal-hussein)
in support of Rancher v2 managed k3s upgrades. See Also: https://rancher.com/docs/k3s/latest/en/upgrades/automated/
apiVersion: upgrade.cattle.io/v1 kind: Plan metadata: name: k3s-server namespace: system-upgrade labels: k3s-upgrade: server spec: concurrency: 1 # Batch size (roughly maps to maximum number of unschedulable nodes) version: v1.23.5+k3s1 nodeSelector: matchExpressions: - {key: k3s-upgrade, operator: Exists} - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]} - {key: k3os.io/mode, operator: DoesNotExist} - {key: node-role.kubernetes.io/control-plane, operator: Exists} serviceAccountName: system-upgrade tolerations:
- key: "node-role.kubernetes.io/master" operator: "Exists" cordon: true upgrade: image: rancher/k3s-upgrade
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: k3s-agent
namespace: system-upgrade
labels:
k3s-upgrade: agent
spec:
concurrency: 1 # Batch size (roughly maps to maximum number of unschedulable nodes)
version: v1.23.5+k3s1
nodeSelector:
matchExpressions:
- {key: k3s-upgrade, operator: Exists}
- {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]}
- {key: k3os.io/mode, operator: DoesNotExist}
- {key: node-role.kubernetes.io/control-plane, operator: DoesNotExist}
serviceAccountName: system-upgrade
prepare:
# Defaults to the same "resolved" tag that is used for the upgrade container, NOT latest
image: rancher/k3s-upgrade
args: ["prepare", "k3s-server"]
drain:
force: true
skipWaitForDeleteTimeout: 60 # 1.18+ (honor pod disruption budgets up to 60 seconds per pod then moves on)
upgrade:
image: rancher/k3s-upgrade