sidero icon indicating copy to clipboard operation
sidero copied to clipboard

Does sidero support rolling updates?

Open anthr76 opened this issue 2 years ago • 6 comments

When the cluster was initially bootstrapped it started at K8s version v1.21.5

If I change the K8s version in the Machine cluster.x-k8s.io/v1alpha4 with a rolling update strategy no nodes start provisioning. One machine gets created from capi but never provisioned/binded? Is this expected?

anthr76 avatar Jan 26 '22 19:01 anthr76

Hi @anthr76, sidero should definitely support rolling updates. If a machine got created, then you'll need to check the logs for the various providers, as something must be failing and that's causing the machine selection in sidero to fail. Dig around a bit more and if you still have issues feel free to ping me here or on slack.

rsmitty avatar Jan 28 '22 19:01 rsmitty

I have bumped the TalosControlPlane CRD

k get taloscontrolplanes.controlplane.cluster.x-k8s.io -o yaml
apiVersion: v1
items:
- apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
  kind: TalosControlPlane
  metadata:
    creationTimestamp: "2021-10-11T13:06:56Z"
    finalizers:
    - talos.controlplane.cluster.x-k8s.io
    generation: 9
    labels:
      cluster.x-k8s.io/cluster-name: scr1-cluster-0
      kustomize.toolkit.fluxcd.io/name: cluster-0-iac
      kustomize.toolkit.fluxcd.io/namespace: flux-system
    name: scr1-cluster-0-cp
    namespace: default
    ownerReferences:
    - apiVersion: cluster.x-k8s.io/v1alpha4
      blockOwnerDeletion: true
      controller: true
      kind: Cluster
      name: scr1-cluster-0
      uid: 817f7d98-7b9f-4f4e-8c63-528062e1da1e
    resourceVersion: "114043222"
    uid: bbd532ef-e0a2-4fbf-be52-3f50b64fbf4b
  spec:
    controlPlaneConfig:
      controlplane:
        generateType: controlplane
        talosVersion: v1.0.0-beta.1 <-- Changed
      init:
        generateType: init
        talosVersion: v1.0.0-beta.1 <-- Chaned
    infrastructureTemplate:
      apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
      kind: MetalMachineTemplate
      name: scr1-cluster-0-cp
    replicas: 3
    version: v1.23.5 <-- Changed

Though, the cluster doesn't seem to be doing anything about it..

k get servers,machines,clusters -o wide
NAME                                                           HOSTNAME                      BMC IP                            ACCEPTED   ALLOCATED   CLEAN   POWER   AGE
server.metal.sidero.dev/00000000-0000-0000-0000-d05099defa93   worker-01.scr1.rabbito.tech   worker-1-bmc.scr1.rabbito.tech    true       true        false   on      61d
server.metal.sidero.dev/00c03111-0000-0000-0000-dca63203cf77   worker-08.scr1.rabbito.tech                                     true       true        false   on      154d
server.metal.sidero.dev/00c03111-0000-0000-0000-dca63203d2ff   worker-09.scr1.rabbito.tech                                     true       true        false   on      49d
server.metal.sidero.dev/00c03111-0000-0000-0000-dca632395d69   worker-06.scr1.rabbito.tech                                     true       true        false   on      49d
server.metal.sidero.dev/00c03111-0000-0000-0000-dca632397689   worker-07.scr1.rabbito.tech                                     true       true        false   on      157d
server.metal.sidero.dev/00c03111-0000-0000-0000-dca63246d63c   worker-05.scr1.rabbito.tech                                     true       true        false   on      4d14h
server.metal.sidero.dev/00d03114-0000-0000-0000-dca632cc34a6   worker-04.scr1.rabbito.tech                                     true       true        false   on      157d
server.metal.sidero.dev/3a0a1648-d534-11e3-a6b5-4dfee0117903   master-03.scr1.rabbito.tech   master-03-bmc.scr1.rabbito.tech   true       true        false   on      157d
server.metal.sidero.dev/a1e84776-ca2b-11e3-890c-2715cdd68d02   master-02.scr1.rabbito.tech   master-02-bmc.scr1.rabbito.tech   true       true        false   on      157d
server.metal.sidero.dev/a22c5596-c257-11e3-aa14-9d96fd03c203   master-01.scr1.rabbito.tech   master-01-bmc.scr1.rabbito.tech   true       true        false   on      157d
server.metal.sidero.dev/d057a478-09c5-430f-9759-d05099defa97   worker-02.scr1.rabbito.tech   worker-2-bmc.scr1.rabbito.tech    true       true        false   on      157d
server.metal.sidero.dev/d057a478-09c5-430f-9759-d05099defb8b   worker-03.scr1.rabbito.tech   worker-3-bmc.scr1.rabbito.tech    true       true        false   on      157d

NAME                                                               CLUSTER          AGE     PROVIDERID                                      PHASE     VERSION   NODENAME
machine.cluster.x-k8s.io/scr1-cluster-0-cp-6b4vm                   scr1-cluster-0   157d    sidero://a22c5596-c257-11e3-aa14-9d96fd03c203   Running   v1.21.5   master-01
machine.cluster.x-k8s.io/scr1-cluster-0-cp-g8s6z                   scr1-cluster-0   157d    sidero://3a0a1648-d534-11e3-a6b5-4dfee0117903   Running   v1.21.5   master-03
machine.cluster.x-k8s.io/scr1-cluster-0-cp-j9p79                   scr1-cluster-0   157d    sidero://a1e84776-ca2b-11e3-890c-2715cdd68d02   Running   v1.21.5   master-02
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-66z64   scr1-cluster-0   49d     sidero://00c03111-0000-0000-0000-dca632395d69   Running   v1.21.5   worker-06
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-c8kdt   scr1-cluster-0   157d    sidero://d057a478-09c5-430f-9759-d05099defa97   Running   v1.21.5   worker-02
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-cxct5   scr1-cluster-0   49d     sidero://00c03111-0000-0000-0000-dca63203cf77   Running   v1.21.5   worker-08
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-fpzmr   scr1-cluster-0   61d     sidero://00000000-0000-0000-0000-d05099defa93   Running   v1.21.5   worker-01
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-n2qvj   scr1-cluster-0   47d     sidero://00c03111-0000-0000-0000-dca63203d2ff   Running   v1.21.5   worker-09
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-rr5nr   scr1-cluster-0   157d    sidero://d057a478-09c5-430f-9759-d05099defb8b   Running   v1.21.5   worker-03
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-x4kg4   scr1-cluster-0   157d    sidero://00c03111-0000-0000-0000-dca632397689   Running   v1.21.5   worker-07
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-xhrwk   scr1-cluster-0   157d    sidero://00d03114-0000-0000-0000-dca632cc34a6   Running   v1.21.5   worker-04
machine.cluster.x-k8s.io/scr1-cluster-0-workers-756d7f769d-xwmkh   scr1-cluster-0   4d14h   sidero://00c03111-0000-0000-0000-dca63246d63c   Running   v1.21.5   worker-05

NAME                                      AGE    PHASE
cluster.cluster.x-k8s.io/scr1-cluster-0   157d   Provisioned

@rsmitty any suggestions what I can poke for logs?

anthr76 avatar Mar 17 '22 13:03 anthr76

you'd have to replace the machines, currently only new servers/machines would get the new config

frezbo avatar Mar 17 '22 14:03 frezbo

Yep that does make sense so a rollingUpdate does not happen then?

anthr76 avatar Mar 17 '22 22:03 anthr76

Can we simply delete the machine and leave the server resource to do this? Or do we have to delete both objects, essentially following the decommission documentation?

One thing I've noticed, is that setting the talosVersion to v1.3.6 in the TalosControlPlane, doesn't seem to matter. Every time a new server comes online, it's says v1.3.0 when I run talosctl -n <node> version.

japtain-cack avatar Apr 09 '23 19:04 japtain-cack

talosVersion: only specifies the machine configuration contract: https://github.com/siderolabs/cluster-api-bootstrap-provider-talos/#usage

smira avatar Apr 10 '23 10:04 smira