cluster-api-k3s icon indicating copy to clipboard operation
cluster-api-k3s copied to clipboard

cluster version upgrades don't work

Open lukebond opened this issue 1 year ago • 8 comments

steps to reproduce:

  • create a cluster, wait for it to become healthy
  • set up a terminal watch on kubectl get machines for your cluster
  • in another terminal, set up a second terminal watch on kubectl get kthreescontrolplane for your clluster
  • in another terminal, edit the KthreesControlPlane resource, modifying the version field to some higher version you want to upgrade to
  • observe that the machines watch shows an additional machine created, then immediately after, deleted
  • observe that the KCP watch showed the upgrade progressing momentarily (new unready node added), then reverted
  • observe that the version field in the spec that you changed earlier has been reverted

the problem can be found in the CP provider logs:

2023-12-15T13:16:37Z	INFO	controllers.KThreesControlPlane	Waiting for control plane to pass preflight checks	{"namespace": "account-82cd74f5-359b-4d1d-ba2c-4325cb6ddc94", "KThreesControlPlane": "a55b4ll5-1b2c-47a0-bed0-435742021129-s7hr7-control-plane", "cluster": "a55b4ll5-1b2c-47a0-bed0-435742021129-s7hr7", "failures": "machine a55b4ll5-1b2c-47a0-bed0-435742021129-s7hr7-control-plane-cf5tm does not have AgentHealthy condition: precondition check failed"}

if you check the machine identifier against the watch of machines earlier, you will notice that it's the new machine it created. basically it created a machine, noticed immediately after that it wasn't ready, and deleted it.

we need to improve the logic here to make sure it can wait to see if the node becomes healthy before eventually giving up, guided by how the kubeadm provider does it.

lukebond avatar Dec 15 '23 13:12 lukebond

I'm not sure if there are any logic inside KthreesControlPlane to update Spec version field. Could it possible updated by some external system?

mogliang avatar Jan 17 '24 01:01 mogliang

I am experiencing the same issue by having 3 control plane replicas:

2024-01-17T16:38:26Z	INFO	controllers.KThreesControlPlane	ClusterStatus	{"namespace": "default", "KThreesControlPlane": "k3s-control-plane", "cluster": "k3s", "workload": {"Nodes":1,"ReadyNodes":1}}
2024-01-17T16:38:41Z	INFO	controllers.KThreesControlPlane	Reconcile KThreesControlPlane	{"namespace": "default", "KThreesControlPlane": "k3s-control-plane", "cluster": "k3s"}
2024-01-17T16:38:41Z	INFO	controllers.KThreesControlPlane	Scaling up control plane	{"namespace": "default", "KThreesControlPlane": "k3s-control-plane", "cluster": "k3s", "Desired": 3, "Existing": 1}
2024-01-17T16:38:41Z	INFO	controllers.KThreesControlPlane	Waiting for control plane to pass preflight checks	{"namespace": "default", "KThreesControlPlane": "k3s-control-plane", "cluster": "k3s", "failures": "machine k3s-control-plane-jppbp does not have AgentHealthy condition: precondition check failed"}
2024-01-17T16:38:41Z	DEBUG	events	Waiting for control plane to pass preflight checks to continue reconciliation: machine k3s-control-plane-jppbp does not have AgentHealthy condition: precondition check failed	{"type": "Warning", "object": {"kind":"KThreesControlPlane","namespace":"default","name":"k3s-control-plane","uid":"027488d9-e276-4a57-ba53-6f289bd8b097","apiVersion":"controlplane.cluster.x-k8s.io/v1beta1","resourceVersion":"200480"}, "reason": "ControlPlaneUnhealthy"}

It is checking this condition when trying to scale up. Is this condition ever set? I didn't double check yet, just reporting my finding.

Regarding the spec.version field I would expect the KCP to trigger a rollout upgrade like for kubeadm.

anmazzotti avatar Jan 17 '24 16:01 anmazzotti

Would you please provide the machine CR here?

I suspect machine unable to match the target cluster node, may be didn't use external cloud provider?

mogliang avatar Feb 20 '24 06:02 mogliang

Would you please provide the machine CR here?

I suspect machine unable to match the target cluster node, may be didn't use external cloud provider?

Here's an example of the machine that's created when trying to upgrade the controlplane:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Machine
metadata:
  annotations:
    controlplane.cluster.x-k8s.io/kthrees-server-configuration: '{"kubeControllerManagerArgs":["address=0.0.0.0","bind-address=0.0.0.0","authorization-always-allow-paths=/healthz,/readyz,/livez,/metrics"],"kubeSchedulerArgs":["address=0.0.0.0","bind-address=0.0.0.0","authorization-always-allow-paths=/healthz,/readyz,/livez,/metrics"],"disableComponents":["traefik","servicelb","coredns"]}'
  creationTimestamp: "2024-03-27T15:19:05Z"
  finalizers:
  - machine.cluster.x-k8s.io
  generation: 2
  labels:
    cluster.x-k8s.io/cluster-name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g
    cluster.x-k8s.io/control-plane: ""
    cluster.x-k8s.io/control-plane-name: ""
    finance.influxdata.io/category: internal
    granite.influxdata.io/account: 82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
    tubernetes.influxdata.io/cloud-dedicated-cluster-name: f6b71290-cf84-4195-85a2-ec5bc5e8be82
  name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-b95sf
  namespace: account-82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
  ownerReferences:
  - apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: KThreesControlPlane
    name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane
    uid: f0f1fc5f-d0c5-4e30-8228-2a51870a17e5
  resourceVersion: "794755970"
  uid: ae2a1447-8f6c-4ce2-9818-ddde44fcffa5
spec:
  bootstrap:
    configRef:
      apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
      kind: KThreesConfig
      name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-6kbgs
      namespace: account-82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
      uid: 39abace0-01a6-40df-a21e-8247f7b3aa91
    dataSecretName: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-6kbgs
  clusterName: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g
  failureDomain: us-east-1a
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSMachine
    name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-sbd9x2
    namespace: account-82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
    uid: 5fddaadb-e929-43ad-a418-3c961cd90f87
  nodeDeletionTimeout: 10s
  version: v1.24.17+k3s1
status:
  bootstrapReady: true
  conditions:
  - lastTransitionTime: "2024-03-27T15:19:10Z"
    message: 1 of 2 completed
    reason: InstanceNotReady
    severity: Warning
    status: "False"
    type: Ready
  - lastTransitionTime: "2024-03-27T15:19:06Z"
    status: "True"
    type: BootstrapReady
  - lastTransitionTime: "2024-03-27T15:19:10Z"
    message: 2 of 3 completed
    reason: InstanceNotReady
    severity: Warning
    status: "False"
    type: InfrastructureReady
  - lastTransitionTime: "2024-03-27T15:19:05Z"
    reason: WaitingForNodeRef
    severity: Info
    status: "False"
    type: NodeHealthy
  lastUpdated: "2024-03-27T15:19:06Z"
  observedGeneration: 2
  phase: Provisioning

And here's the awsmachine:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachine
metadata:
 annotations:
   cluster.x-k8s.io/cloned-from-groupkind: AWSMachineTemplate.infrastructure.cluster.x-k8s.io
   cluster.x-k8s.io/cloned-from-name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-small
   sigs.k8s.io/cluster-api-provider-aws-last-applied-security-groups: '{"sg-091d3888587c69749":{}}'
   sigs.k8s.io/cluster-api-provider-aws-last-applied-tags: '{"finance.influxdata.io/category":"internal","granite.influxdata.io/account":"82cd74f5-359b-4d1d-ba2c-4325cb6ddc94","tubernetes.influxdata.io/cloud-dedicated-cluster-name":"f6b71290-cf84-4195-85a2-ec5bc5e8be82","tubernetes.influxdata.io/workload-cluster-name":"f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g"}'
   sigs.k8s.io/cluster-api-provider-last-applied-tags-on-volumes: '{"vol-08fc23e15377b00ec":{"finance.influxdata.io/category":"internal","granite.influxdata.io/account":"82cd74f5-359b-4d1d-ba2c-4325cb6ddc94","tubernetes.influxdata.io/cloud-dedicated-cluster-name":"f6b71290-cf84-4195-85a2-ec5bc5e8be82","tubernetes.influxdata.io/workload-cluster-name":"f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g"}}'
 creationTimestamp: "2024-03-27T15:19:05Z"
 deletionGracePeriodSeconds: 0
 deletionTimestamp: "2024-03-27T15:19:56Z"
 finalizers:
 - awsmachine.infrastructure.cluster.x-k8s.io
 generation: 3
 labels:
   cluster.x-k8s.io/cluster-name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g
   cluster.x-k8s.io/control-plane: ""
   cluster.x-k8s.io/control-plane-name: ""
   finance.influxdata.io/category: internal
   granite.influxdata.io/account: 82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
   tubernetes.influxdata.io/cloud-dedicated-cluster-name: f6b71290-cf84-4195-85a2-ec5bc5e8be82
 name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-sbd9x2
 namespace: account-82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
 ownerReferences:
 - apiVersion: cluster.x-k8s.io/v1beta1
   blockOwnerDeletion: true
   controller: true
   kind: Machine
   name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-b95sf
   uid: ae2a1447-8f6c-4ce2-9818-ddde44fcffa5
 resourceVersion: "794758136"
 uid: 5fddaadb-e929-43ad-a418-3c961cd90f87
spec:
 additionalSecurityGroups:
 - filters:
   - name: tag:Name
     values:
     - f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-node-additional
 ami: {}
 cloudInit:
   insecureSkipSecretsManager: true
 iamInstanceProfile: f6b71290-cf84-4195-85a2-ec5bc5e8be82-control-plane
 instanceID: i-0cd0cfdc07f69bf8f
 instanceMetadataOptions:
   httpEndpoint: enabled
   httpPutResponseHopLimit: 5
   httpTokens: optional
   instanceMetadataTags: disabled
 instanceType: t3.medium
 providerID: aws:///us-east-1a/i-0cd0cfdc07f69bf8f
 rootVolume:
   size: 32
 sshKeyName: cst-staging-default
status:
 addresses:
 - address: ip-10-0-217-12.ec2.internal
   type: InternalDNS
 - address: 10.0.217.12
   type: InternalIP
 conditions:
 - lastTransitionTime: "2024-03-27T15:19:57Z"
   message: 1 of 3 completed
   reason: Deleted
   severity: Info
   status: "False"
   type: Ready
 - lastTransitionTime: "2024-03-27T15:19:57Z"
   reason: Deleted
   severity: Info
   status: "False"
   type: ELBAttached
 - lastTransitionTime: "2024-03-27T15:19:57Z"
   reason: Deleted
   severity: Info
   status: "False"
   type: InstanceReady
 - lastTransitionTime: "2024-03-27T15:19:10Z"
   status: "True"
   type: SecurityGroupsReady
 instanceState: running
 ready: true

wikoion avatar Mar 27 '24 14:03 wikoion

The last status conditions before being flagged for deletion:

Conditions:
    Last Transition Time:  2024-03-27T15:31:27Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2024-03-27T15:30:51Z
    Status:                True
    Type:                  BootstrapReady
    Last Transition Time:  2024-03-27T15:31:27Z
    Status:                True
    Type:                  InfrastructureReady
    Last Transition Time:  2024-03-27T15:31:28Z
    Reason:                NodeProvisioning
    Severity:              Warning
    Status:                False
    Type:                  NodeHealthy
  Infrastructure Ready:    true
  Last Updated:            2024-03-27T15:31:28Z
  Observed Generation:     3
  Phase:                   Provisioned
Events:
  Type    Reason             Age                From                           Message
  ----    ------             ----               ----                           -------
  Normal  DetectedUnhealthy  2s (x19 over 47s)  machinehealthcheck-controller  Machine account-82cd74f5-359b-4d1d-ba2c-4325cb6ddc94/f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane/f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-9l76q/ has unhealthy node

wikoion avatar Mar 27 '24 15:03 wikoion

@wikoion Could you run kubectl describe node for this CP node with workload cluster? I meet the same situation (NodeHealthy is false) if I set the disableExternalCloudProvider flag. And the providerID in AWSMachine could not match the one in the workload cluster node.

nasusoba avatar Apr 01 '24 09:04 nasusoba

@nasusoba Ah interesting, are you saying that if you don't set disableExternalCloudProvider you can upgrade the k8s version of a node? I never see the node join from within the workload cluster, so am unable to describe it. Here's the config for a healthy controlplane node:

Name:               ip-10-0-150-242.ec2.internal
Roles:              control-plane,etcd,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3.medium
                    beta.kubernetes.io/os=linux
                    egress.k3s.io/cluster=true
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-150-242.ec2.internal
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/etcd=true
                    node-role.kubernetes.io/master=true
                    node.kubernetes.io/instance-type=t3.medium
                    topology.ebs.csi.aws.com/zone=us-east-1a
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1a
Annotations:        cluster.x-k8s.io/cluster-name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g
                    cluster.x-k8s.io/cluster-namespace: account-82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
                    cluster.x-k8s.io/labels-from-machine:
                    cluster.x-k8s.io/machine: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-rnz2j
                    cluster.x-k8s.io/owner-kind: KThreesControlPlane
                    cluster.x-k8s.io/owner-name: f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-00d0866900a509e17"}
                    etcd.k3s.cattle.io/node-address: 10.0.150.242
                    etcd.k3s.cattle.io/node-name: ip-10-0-150-242.ec2.internal-c42faa53
                    flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"2a:f5:d8:cb:f9:50"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.0.150.242
                    k3s.io/node-args:
                      ["server","--disable","traefik","--disable","servicelb","--disable","coredns","--disable-cloud-controller","true","--kube-apiserver-arg","...
                    k3s.io/node-config-hash: WE5TW5XEJHZA3REEKEW7IHRXJAPI6NRG6KYVY2SDGCPNQLL2UHKA====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/defdbf336ba6d95025d10079eb09e272e93c4306400d428fafec9e7a24d3279b"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 07 Mar 2024 17:19:42 +0000
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
                    node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-150-242.ec2.internal
  AcquireTime:     <unset>
  RenewTime:       Wed, 03 Apr 2024 16:31:15 +0100
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 03 Apr 2024 16:30:13 +0100   Thu, 07 Mar 2024 17:19:42 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 03 Apr 2024 16:30:13 +0100   Thu, 07 Mar 2024 17:19:42 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 03 Apr 2024 16:30:13 +0100   Thu, 07 Mar 2024 17:19:42 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 03 Apr 2024 16:30:13 +0100   Thu, 07 Mar 2024 17:19:53 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:   10.0.150.242
  InternalDNS:  ip-10-0-150-242.ec2.internal
  Hostname:     ip-10-0-150-242.ec2.internal
Capacity:
  cpu:                2
  ephemeral-storage:  32328636Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3944484Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  31449297077
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3944484Ki
  pods:               110
System Info:
  Machine ID:                 ec274e1f654a49100559b896d547172e
  System UUID:                ec274e1f-654a-4910-0559-b896d547172e
  Boot ID:                    4c8b03da-ec44-4d95-bfc8-a50a51a08e28
  Kernel Version:             5.15.0-1055-aws
  OS Image:                   Ubuntu 20.04.6 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.5.13-k3s1
  Kubelet Version:            v1.23.14+k3s1
  Kube-Proxy Version:         v1.23.14+k3s1
PodCIDR:                      10.42.4.0/24
PodCIDRs:                     10.42.4.0/24
ProviderID:                   aws:///us-east-1a/i-00d0866900a509e17
Non-terminated Pods:          (5 in total)
  Namespace                   Name                                  CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                  ------------  ----------  ---------------  -------------  ---
  kube-system                 aws-cloud-controller-manager-2jcfw    200m (10%)    0 (0%)      0 (0%)           0 (0%)         26d
  kube-system                 ebs-csi-node-dlmvw                    30m (1%)      0 (0%)      120Mi (3%)       768Mi (19%)    26d
  kube-system                 metrics-server-65cd754bcd-vpz4d       100m (5%)     0 (0%)      70Mi (1%)        0 (0%)         6d22h
  monitoring                  node-exporter-kx48w                   112m (5%)     270m (13%)  200Mi (5%)       220Mi (5%)     7d17h
  promtail                    promtail-78sjg                        100m (5%)     100m (5%)   100M (2%)        100M (2%)      26d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests         Limits
  --------           --------         ------
  cpu                542m (27%)       370m (18%)
  memory             508944640 (12%)  1135993088 (28%)
  ephemeral-storage  0 (0%)           0 (0%)
  hugepages-1Gi      0 (0%)           0 (0%)

Here's the kthreescontrolplane config, I'm not setting disableExternalCloudProvider:

Name:         f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane
Namespace:    account-82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
Labels:       cluster.x-k8s.io/cluster-name=f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g
            finance.influxdata.io/category=internal
            granite.influxdata.io/account=82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
            tubernetes.influxdata.io/cloud-dedicated-cluster-name=f6b71290-cf84-4195-85a2-ec5bc5e8be82
Annotations:  <none>
API Version:  controlplane.cluster.x-k8s.io/v1beta1
Kind:         KThreesControlPlane
Metadata:
Creation Timestamp:  2024-03-07T17:13:17Z
Finalizers:
  kthrees.controlplane.cluster.x-k8s.io
Generation:  15
Owner References:
  API Version:           tubernetes.influxdata.io/v1alpha2
  Block Owner Deletion:  true
  Controller:            false
  Kind:                  WorkloadCluster
  Name:                  f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g
  UID:                   113eee62-6fcd-450c-a0fe-338b4a01f50c
  API Version:           cluster.x-k8s.io/v1beta1
  Block Owner Deletion:  true
  Controller:            true
  Kind:                  Cluster
  Name:                  f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g
  UID:                   521ee20d-2bd0-45f3-af7f-d53cb88a3b92
Resource Version:        825209270
UID:                     f0f1fc5f-d0c5-4e30-8228-2a51870a17e5
Spec:
Infrastructure Template:
  API Version:  infrastructure.cluster.x-k8s.io/v1beta2
  Kind:         AWSMachineTemplate
  Name:         f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g-control-plane-small
Kthrees Config Spec:
  Agent Config:
    Kube Proxy Args:
      metrics-bind-address=0.0.0.0
    Node Name:  {{ ds.meta_data.local_hostname }}
    Node Taints:
      node-role.kubernetes.io/control-plane:NoSchedule
      node-role.kubernetes.io/master:NoSchedule
  Server Config:
    Disable Components:
      traefik
      servicelb
      coredns
    Kube Controller Manager Args:
      address=0.0.0.0
      bind-address=0.0.0.0
      authorization-always-allow-paths=/healthz,/readyz,/livez,/metrics
    Kube Scheduler Args:
      address=0.0.0.0
      bind-address=0.0.0.0
      authorization-always-allow-paths=/healthz,/readyz,/livez,/metrics
Machine Template:
  Metadata:
    Labels:
      finance.influxdata.io/category:                         internal
      granite.influxdata.io/account:                          82cd74f5-359b-4d1d-ba2c-4325cb6ddc94
      tubernetes.influxdata.io/cloud-dedicated-cluster-name:  f6b71290-cf84-4195-85a2-ec5bc5e8be82
Replicas:                                                     3
Version:                                                      v1.23.14+k3s1
Status:
Conditions:
  Last Transition Time:  2024-04-03T15:28:56Z
  Status:                True
  Type:                  Ready
  Last Transition Time:  2024-03-07T17:18:23Z
  Status:                True
  Type:                  Available
  Last Transition Time:  2024-03-07T17:17:09Z
  Status:                True
  Type:                  CertificatesAvailable
  Last Transition Time:  2024-03-07T17:21:25Z
  Status:                True
  Type:                  ControlPlaneComponentsHealthy
  Last Transition Time:  2024-04-03T15:28:56Z
  Status:                True
  Type:                  MachinesReady
  Last Transition Time:  2024-04-03T15:27:54Z
  Status:                True
  Type:                  MachinesSpecUpToDate
  Last Transition Time:  2024-04-03T15:28:55Z
  Status:                True
  Type:                  Resized
  Last Transition Time:  2024-03-07T17:17:09Z
  Status:                True
  Type:                  TokenAvailable
Initialized:             true
Observed Generation:     1
Ready:                   true
Ready Replicas:          3
Replicas:                3
Selector:                cluster.x-k8s.io/cluster-name=f6b71290-cf84-4195-85a2-ec5bc5e8be82-v7j8g,cluster.x-k8s.io/control-plane-name
Updated Replicas:        3

wikoion avatar Apr 03 '24 15:04 wikoion

@wikoion Yes, I am able to upgrade the cluster if I do not set disableExternalCloudProvider. I am also testing with CAPD( docker as the infrastruecture provider), and I am not able to reproduce the failure with the latest main build. The upgrade is successful so I am guessing this issue is already fixed on the latest main. Could you also try to upgrade the controller to see if this issue is fixed??

nasusoba avatar Apr 07 '24 04:04 nasusoba

I traced the issue back to k3s configuration issues between versions, we were passing the --address flag to the kube scheduler which has been removed in ther version we were trying to upgrade to. This issue can be closed, thank you for the help!

wikoion avatar Apr 17 '24 13:04 wikoion