system-upgrade-controller icon indicating copy to clipboard operation
system-upgrade-controller copied to clipboard

K3s Upgrade Plan, 5 of 6 nodes upgraded, one node was terminated with badrequest

Open braucktoon opened this issue 3 years ago • 0 comments

Version v0.9.1

Platform/Architecture pi@dairy:~ $ uname -a Linux dairy 5.15.32-v8+ #1538 SMP PREEMPT Thu Mar 31 19:40:39 BST 2022 aarch64 GNU/Linux

Describe the bug 5 out of 6 nodes upgraded to latest K3S successfully, one node was terminated with BadRequest

To Reproduce Run the OOB plan to upgrade to latest K3S. I also saw all 3 server nodes were upgraded first then the worker nodes were then processed but one failed.

Expected behavior All 6 nodes were upgraded successfully

Actual behavior 5 of 6 were upgraded

Additional context Davids-iMac:K3s$ k get nodes NAME STATUS ROLES AGE VERSION dairy Ready,SchedulingDisabled 61d v1.22.7+k3s1 gail Ready 55d v1.23.5+k3s1 glenn Ready 61d v1.23.5+k3s1 katy-kat Ready control-plane,etcd,master 61d v1.23.5+k3s1 squirrelly-dan Ready control-plane,etcd,master 61d v1.23.5+k3s1 wayne Ready control-plane,etcd,master 61d v1.23.5+k3s1 Davids-iMac:K3s$ k get pods,jobs -n system-upgrade NAME READY STATUS RESTARTS AGE pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-2sbvm 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-462hs 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-7jkgd 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-7pr27 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-c59jp 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-cq767 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-gsc85 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-kj9lm 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-ntvt8 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-plmtr 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r6dvh 0/1 Init:1/2 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r9vm7 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-vs4zd 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-wnlck 0/1 Init:Error 0 15h pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-xvtkz 0/1 Init:Error 0 15h pod/system-upgrade-controller-8677c8fb4-62cr7 1/1 Running 0 14d

NAME COMPLETIONS DURATION AGE job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 0/1 15h 15h

Davids-iMac:K3s$ k describe job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 -n system-upgrade Name: apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 Namespace: system-upgrade Selector: controller-uid=728cbed1-ce28-46ef-be2c-8a640164a121 Labels: objectset.rio.cattle.io/hash=80f52c1aa7257a6b5bd08982446fceff8c1a2394 plan.upgrade.cattle.io/k3s-agent=f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc upgrade.cattle.io/controller=system-upgrade-controller upgrade.cattle.io/node=dairy upgrade.cattle.io/plan=k3s-agent upgrade.cattle.io/version=v1.23.5-k3s1 Annotations: batch.kubernetes.io/job-tracking: objectset.rio.cattle.io/applied: H4sIAAAAAAAA/+xY227bOBN+lf/nteTKiXOQgb3wxu7WaOMYddpFUQQBTY5srilSS47sGIbffTGUfGoSN+3uRS+CALFIcQ6c+b7hUCuWA3LJkbP2inFjLHJU1nga2vFfINADNpyyDc... objectset.rio.cattle.io/id: system-upgrade-controller objectset.rio.cattle.io/owner-gvk: upgrade.cattle.io/v1, Kind=Plan objectset.rio.cattle.io/owner-name: k3s-agent objectset.rio.cattle.io/owner-namespace: system-upgrade upgrade.cattle.io/ttl-seconds-after-finished: 900 Parallelism: 1 Completions: 1 Completion Mode: NonIndexed Start Time: Sat, 23 Apr 2022 17:34:53 -0400 Active Deadline Seconds: 900s Pods Statuses: 1 Active / 0 Succeeded / 14 Failed Pod Template: Labels: controller-uid=728cbed1-ce28-46ef-be2c-8a640164a121 job-name=apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 plan.upgrade.cattle.io/k3s-agent=f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc upgrade.cattle.io/controller=system-upgrade-controller upgrade.cattle.io/node=dairy upgrade.cattle.io/plan=k3s-agent upgrade.cattle.io/version=v1.23.5-k3s1 Service Account: system-upgrade Init Containers: prepare: Image: rancher/k3s-upgrade:v1.23.5-k3s1 Port: Host Port: Args: prepare k3s-server Environment: SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName) SYSTEM_UPGRADE_POD_NAME: (v1:metadata.name) SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid) SYSTEM_UPGRADE_PLAN_NAME: k3s-agent SYSTEM_UPGRADE_PLAN_LATEST_HASH: f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.23.5-k3s1 Mounts: /host from host-root (rw) /run/system-upgrade/pod from pod-info (ro) drain: Image: rancher/kubectl:v1.21.9 Port: Host Port: Args: drain dairy --pod-selector !upgrade.cattle.io/controller --ignore-daemonsets --delete-local-data --force --skip-wait-for-delete-timeout 60 Environment: SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName) SYSTEM_UPGRADE_POD_NAME: (v1:metadata.name) SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid) SYSTEM_UPGRADE_PLAN_NAME: k3s-agent SYSTEM_UPGRADE_PLAN_LATEST_HASH: f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.23.5-k3s1 Mounts: /host from host-root (rw) /run/system-upgrade/pod from pod-info (ro) Containers: upgrade: Image: rancher/k3s-upgrade:v1.23.5-k3s1 Port: Host Port: Environment: SYSTEM_UPGRADE_NODE_NAME: (v1:spec.nodeName) SYSTEM_UPGRADE_POD_NAME: (v1:metadata.name) SYSTEM_UPGRADE_POD_UID: (v1:metadata.uid) SYSTEM_UPGRADE_PLAN_NAME: k3s-agent SYSTEM_UPGRADE_PLAN_LATEST_HASH: f956d4c7ff118941fde09e3b9cc124faece4ef5a06f0fdc4c91051fc SYSTEM_UPGRADE_PLAN_LATEST_VERSION: v1.23.5-k3s1 Mounts: /host from host-root (rw) /run/system-upgrade/pod from pod-info (ro) Volumes: host-root: Type: HostPath (bare host directory volume) Path: / HostPathType: Directory pod-info: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.labels -> labels metadata.annotations -> annotations Events:

Davids-iMac:K3s$ k logs job.batch/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-98793 -n system-upgrade Found 15 pods, using pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-2sbvm Error from server (BadRequest): container "upgrade" in pod "apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-2sbvm" is terminated

Davids-iMac:K3s$ k logs pod/apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r6dvh -n system-upgrade Error from server (BadRequest): container "upgrade" in pod "apply-k3s-agent-on-dairy-with-f956d4c7ff118941fde09e3b9cc-r6dvh" is waiting to start: PodInitializing

Plan:

These plans are adapted from work by Dax McDonald (https://github.com/daxmc99) and Hussein Galal (https://github.com/galal-hussein)

in support of Rancher v2 managed k3s upgrades. See Also: https://rancher.com/docs/k3s/latest/en/upgrades/automated/


apiVersion: upgrade.cattle.io/v1 kind: Plan metadata: name: k3s-server namespace: system-upgrade labels: k3s-upgrade: server spec: concurrency: 1 # Batch size (roughly maps to maximum number of unschedulable nodes) version: v1.23.5+k3s1 nodeSelector: matchExpressions: - {key: k3s-upgrade, operator: Exists} - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]} - {key: k3os.io/mode, operator: DoesNotExist} - {key: node-role.kubernetes.io/control-plane, operator: Exists} serviceAccountName: system-upgrade tolerations:

  • key: "node-role.kubernetes.io/master" operator: "Exists" cordon: true upgrade: image: rancher/k3s-upgrade

apiVersion: upgrade.cattle.io/v1 kind: Plan metadata: name: k3s-agent namespace: system-upgrade labels: k3s-upgrade: agent spec: concurrency: 1 # Batch size (roughly maps to maximum number of unschedulable nodes) version: v1.23.5+k3s1 nodeSelector: matchExpressions: - {key: k3s-upgrade, operator: Exists} - {key: k3s-upgrade, operator: NotIn, values: ["disabled", "false"]} - {key: k3os.io/mode, operator: DoesNotExist} - {key: node-role.kubernetes.io/control-plane, operator: DoesNotExist} serviceAccountName: system-upgrade prepare: # Defaults to the same "resolved" tag that is used for the upgrade container, NOT latest image: rancher/k3s-upgrade args: ["prepare", "k3s-server"] drain: force: true skipWaitForDeleteTimeout: 60 # 1.18+ (honor pod disruption budgets up to 60 seconds per pod then moves on) upgrade: image: rancher/k3s-upgrade

braucktoon avatar Apr 24 '22 13:04 braucktoon