cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

KubeadmConfig owner references not restored

Open killianmuldoon opened this issue 2 years ago • 5 comments

When Cluster API is backed up and restored the ownerReferences on the KubeadmConfig are not restored. As there is a check in the KubeadmConfig reconciler for the ownerReference these KubeadmConfigs are not reconciled after a restore is done.

I'm not sure if there's a practical implication for this on the functioning of the cluster as:

  1. New Machines will create new KubeadmConfigs, so new machine bootstrapping will should not be impacted.

  2. If the machines being restored are already bootstrapped the status fields set during reconciliation shouldn't be used in future.

It's possible that the lack of a reconcile on restored KubeadmConfigs will have a different impact for MachinePools.

Regardless, I think we should attempt to restore the ownerReferences on the KubeadmConfig object if they don't have them. We could rebuild the reference from the Cluster name + ControlPlane / MachineDeployment name labels.

/area bootstrap /kind bug /kind cleanup Possibly related: https://github.com/kubernetes-sigs/cluster-api/issues/3134

killianmuldoon avatar Jul 26 '22 12:07 killianmuldoon

Here is the KubeadmConfig before backing up during my testing:

{
  "apiVersion":"bootstrap.cluster.x-k8s.io/v1beta1",
  "kind":"KubeadmConfig",
  "metadata":{
    "annotations":{
      "cluster.x-k8s.io/cloned-from-groupkind":"KubeadmConfigTemplate.bootstrap.cluster.x-k8s.io",
      "cluster.x-k8s.io/cloned-from-name":"tkg-vc-antrea-md-0"
    },
    "creationTimestamp":"2022-07-26T02:00:44Z",
    "generation":2,
    "labels":{
      "cluster.x-k8s.io/cluster-name":"tkg-vc-antrea",
      "cluster.x-k8s.io/deployment-name":"tkg-vc-antrea-md-0",
      "machine-template-hash":"2318501170",
      "node-pool":"tkg-vc-antrea-worker-pool"
    },
    "name":"tkg-vc-antrea-md-0-g9vlq",
    "namespace":"default",
    "ownerReferences":[
      {
        "apiVersion":"cluster.x-k8s.io/v1beta1",
        "kind":"MachineSet",
        "name":"tkg-vc-antrea-md-0-675d9455c4",
        "uid":"aaf0f32e-30bf-48ec-98b2-3460025abf79"
      },
      {
        "apiVersion":"cluster.x-k8s.io/v1beta1",
        "blockOwnerDeletion":true,
        "controller":true,
        "kind":"Machine",
        "name":"tkg-vc-antrea-md-0-675d9455c4-crnkg",
        "uid":"79e13399-c348-4752-82d5-1aa09370a284"
      }
    ],
    "resourceVersion":"33144",
    "uid":"988556f1-7566-4eff-8470-ecf52c400455"
  },
  "spec":{
    "files":[

    ],
    "format":"cloud-config",
    "joinConfiguration":{
      "discovery":{
        "bootstrapToken":{
          "apiServerEndpoint":"10.180.130.83:6443",
          "caCertHashes":[
            "sha256:ea50162f9a80682561413b2dac768e7b1de60350adb30c090f71ae5645203ec7"
          ],
          "token":"6p9lrl.9f56wnkq55vkjgro"
        }
      },
      "nodeRegistration":{
        "criSocket":"/var/run/containerd/containerd.sock",
        "kubeletExtraArgs":{
          "cloud-provider":"external",
          "tls-cipher-suites":"xxx"
        },
        "name":"{{ ds.meta_data.hostname }}"
      }
    },
    "preKubeadmCommands":[
      "hostname \"{{ ds.meta_data.hostname }}\"",
      "echo \"::1         ipv6-localhost ipv6-loopback\" \u003e/etc/hosts",
      "echo \"127.0.0.1   localhost\" \u003e\u003e/etc/hosts",
      "echo \"127.0.0.1   {{ ds.meta_data.hostname }}\" \u003e\u003e/etc/hosts",
      "echo \"{{ ds.meta_data.hostname }}\" \u003e/etc/hostname",
      "sed -i 's|\".*/pause|\"projects-stg.registry.vmware.com/tkg/pause|' /etc/containerd/config.toml",
      "systemctl restart containerd"
    ],
    "useExperimentalRetryJoin":true,
    "users":[
      {
        "name":"capv",
        "sshAuthorizedKeys":[
          "ssh-rsa xxx"
        ],
        "sudo":"ALL=(ALL) NOPASSWD:ALL"
      }
    ]
  },
  "status":{
    "conditions":[
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"Ready"
      },
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"CertificatesAvailable"
      },
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"DataSecretAvailable"
      }
    ],
    "dataSecretName":"tkg-vc-antrea-md-0-g9vlq",
    "observedGeneration":2,
    "ready":true
  }
}

The same one after restoring:

{
    "apiVersion": "bootstrap.cluster.x-k8s.io/v1beta1",
    "kind": "KubeadmConfig",
    "metadata": {
        "annotations": {
            "cluster.x-k8s.io/cloned-from-groupkind": "KubeadmConfigTemplate.bootstrap.cluster.x-k8s.io",
            "cluster.x-k8s.io/cloned-from-name": "tkg-vc-antrea-md-0"
        },
        "creationTimestamp": "2022-07-26T03:06:52Z",
        "generation": 1,
        "labels": {
            "cluster.x-k8s.io/cluster-name": "tkg-vc-antrea",
            "cluster.x-k8s.io/deployment-name": "tkg-vc-antrea-md-0",
            "machine-template-hash": "2318501170",
            "node-pool": "tkg-vc-antrea-worker-pool",
            "velero.io/backup-name": "prod-backup-include-146",
            "velero.io/restore-name": "prod-restore-include-146"
        },
        "name": "tkg-vc-antrea-md-0-g9vlq",
        "namespace": "default",
        "resourceVersion": "15925",
        "uid": "a348bd01-fbbc-42a6-bc11-178f4630cf82"
    },
    "spec": {
        "files": [],
        "format": "cloud-config",
        "joinConfiguration": {
            "discovery": {
                "bootstrapToken": {
                    "apiServerEndpoint": "10.180.130.83:6443",
                    "caCertHashes": [
                        "sha256:ea50162f9a80682561413b2dac768e7b1de60350adb30c090f71ae5645203ec7"
                    ],
                    "token": "6p9lrl.9f56wnkq55vkjgro"
                }
            },
            "nodeRegistration": {
                "criSocket": "/var/run/containerd/containerd.sock",
                "kubeletExtraArgs": {
                    "cloud-provider": "external",
                    "tls-cipher-suites": "xxx"
                },
                "name": "{{ ds.meta_data.hostname }}"
            }
        },
        "preKubeadmCommands": [
            "hostname \"{{ ds.meta_data.hostname }}\"",
            "echo \"::1         ipv6-localhost ipv6-loopback\" \u003e/etc/hosts",
            "echo \"127.0.0.1   localhost\" \u003e\u003e/etc/hosts",
            "echo \"127.0.0.1   {{ ds.meta_data.hostname }}\" \u003e\u003e/etc/hosts",
            "echo \"{{ ds.meta_data.hostname }}\" \u003e/etc/hostname",
            "sed -i 's|\".*/pause|\"projects-stg.registry.vmware.com/tkg/pause|' /etc/containerd/config.toml",
            "systemctl restart containerd"
        ],
        "useExperimentalRetryJoin": true,
        "users": [
            {
                "name": "capv",
                "sshAuthorizedKeys": [
                    "ssh-rsa xxx"
                ],
                "sudo": "ALL=(ALL) NOPASSWD:ALL"
            }
        ]
    }
}

ywk253100 avatar Jul 27 '22 00:07 ywk253100

I guess one impact of the current behavior is that cluster deletion would ignore the restored KubeadmConfigs without ownerRefs?

sbueringer avatar Aug 16 '22 12:08 sbueringer

There is no status section if the restored KubeadmConfig cannot be adopted by the owner.

This may impact the downstream consumers, e.g. Tanzu considers the machine is still under configuring as Tanzu cannot identify the status

ywk253100 avatar Aug 17 '22 02:08 ywk253100

IMO fixing this behavior is mostly for ensuring a proper cleanup

WRT downstream consumers, I don't think they should check the status of the BootstrapConfig if machines are already provisioned, it is redundant and not representative of the current machine state (bootstrap already completed).

fabriziopandini avatar Aug 26 '22 11:08 fabriziopandini

/assign

killianmuldoon avatar Oct 11 '22 14:10 killianmuldoon

/close

This is fixed in #7394

killianmuldoon avatar Nov 29 '22 16:11 killianmuldoon

@killianmuldoon: Closing this issue.

In response to this:

/close

This is fixed in #7394

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 29 '22 16:11 k8s-ci-robot