ctlptl icon indicating copy to clipboard operation
ctlptl copied to clipboard

v0.8.25 has broken cluster config with containerdConfigPatches

Open iainsproat opened this issue 2 years ago â€ĸ 2 comments

The following cluster-config.yaml file works on v0.8.24 but fails in v0.8.25:

apiVersion: ctlptl.dev/v1alpha1
kind: Registry
name: ctlptl-registry
port: 5000
---
apiVersion: ctlptl.dev/v1alpha1
kind: Cluster
product: kind
registry: ctlptl-registry
name: kind-mycluster
kindV1Alpha4Cluster:
  containerdConfigPatches:
    - |-
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors."example.internal:5001"]
        endpoint = ["http://example.internal:5001"]
  nodes:
  - role: control-plane
    kubeadmConfigPatches:
    - |
      kind: InitConfiguration
      nodeRegistration:
        kubeletExtraArgs:
          node-labels: "ingress-ready=true"
    extraPortMappings:
      - containerPort: 5001
        hostPort: 5001
        protocol: TCP
        listenAddress: "127.0.0.1"

Note that I have appended 127.0.0.1 example.internal to my /etc/hosts file. The complexity in the configuration is because I am hosting another registry within the kind cluster and exposing it on port 5001.

For a minimal failing example:

apiVersion: ctlptl.dev/v1alpha1
kind: Cluster
product: kind
name: kind-mycluster
kindV1Alpha4Cluster:
  containerdConfigPatches:
    - |-
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors."example.internal:5001"]
        endpoint = ["http://example.internal:5001"]

Running the command ctlptl apply -f cluster-config.yaml

The failure message is as follows:

registry.ctlptl.dev/ctlptl-registry created
No kind clusters found.
Creating cluster "mycluster" ...
 ✓ Ensuring node image (kindest/node:v1.28.0) đŸ–ŧ 
 ✓ Preparing nodes đŸ“Ļ  
 ✓ Writing configuration 📜 
 ✗ Starting control-plane đŸ•šī¸ 
Deleted nodes: ["mycluster-control-plane"]
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged mycluster-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Command Output: I1130 13:28:20.858198     170 initconfiguration.go:255] loading configuration from "/kind/kubeadm.conf"
W1130 13:28:20.859005     170 initconfiguration.go:336] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
I1130 13:28:20.864778     170 certs.go:112] creating a new certificate authority for ca
[init] Using Kubernetes version: v1.28.0
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
I1130 13:28:20.937181     170 certs.go:519] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost mycluster-control-plane] and IPs [10.96.0.1 172.19.0.2 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
I1130 13:28:21.184603     170 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I1130 13:28:21.821000     170 certs.go:519] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I1130 13:28:22.105460     170 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I1130 13:28:22.310778     170 certs.go:519] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost mycluster-control-plane] and IPs [172.19.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost mycluster-control-plane] and IPs [172.19.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I1130 13:28:23.438336     170 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I1130 13:28:23.585848     170 kubeconfig.go:103] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I1130 13:28:24.062711     170 kubeconfig.go:103] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I1130 13:28:24.468577     170 kubeconfig.go:103] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I1130 13:28:24.642536     170 kubeconfig.go:103] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I1130 13:28:24.915956     170 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I1130 13:28:24.916004     170 manifests.go:102] [control-plane] getting StaticPodSpecs
I1130 13:28:24.916398     170 certs.go:519] validating certificate period for CA certificate
I1130 13:28:24.916446     170 manifests.go:128] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I1130 13:28:24.916450     170 manifests.go:128] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I1130 13:28:24.916452     170 manifests.go:128] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I1130 13:28:24.916454     170 manifests.go:128] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I1130 13:28:24.916457     170 manifests.go:128] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I1130 13:28:24.916826     170 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I1130 13:28:24.916840     170 manifests.go:102] [control-plane] getting StaticPodSpecs
I1130 13:28:24.916926     170 manifests.go:128] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I1130 13:28:24.916929     170 manifests.go:128] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I1130 13:28:24.916931     170 manifests.go:128] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I1130 13:28:24.916935     170 manifests.go:128] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I1130 13:28:24.916938     170 manifests.go:128] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I1130 13:28:24.916940     170 manifests.go:128] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I1130 13:28:24.916942     170 manifests.go:128] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I1130 13:28:24.917963     170 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I1130 13:28:24.917989     170 manifests.go:102] [control-plane] getting StaticPodSpecs
[control-plane] Creating static Pod manifest for "kube-scheduler"
I1130 13:28:24.918096     170 manifests.go:128] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I1130 13:28:24.918368     170 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
I1130 13:28:24.918391     170 kubelet.go:67] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
I1130 13:28:25.086822     170 waitcontrolplane.go:83] [wait-control-plane] Waiting for the API server to be healthy
I1130 13:28:25.088822     170 loader.go:395] Config loaded from file:  /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I1130 13:28:25.101563     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 3 milliseconds
<multiple repetitions removed for brevity>
I1130 13:29:04.604864     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 1 milliseconds
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1130 13:29:05.104074     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 0 milliseconds

I1130 13:29:09.603468     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1130 13:29:10.105657     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 1 milliseconds
<multiple repetitions removed for brevity>
I1130 13:29:20.107134     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1130 13:29:20.608924     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 1 milliseconds
<multiple repetitions removed for brevity>
I1130 13:29:40.108681     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1130 13:29:40.605108     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
<multiple repetitions removed for brevity>
I1130 13:30:20.105067     170 round_trippers.go:553] GET https://mycluster-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
        cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:108
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        cmd/kubeadm/app/cmd/init.go:111
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_arm64.s:1172
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        cmd/kubeadm/app/cmd/init.go:111
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_arm64.s:1172
creating kind cluster: exit status 1

The machine on which this is failing is an Apple M2 with macOS Sonoma version 14.1.1.

iainsproat avatar Nov 30 '23 16:11 iainsproat

hmmmm...i'm not sure how to resolve this.

The problem is that kind v0.20.0 has deprecated containerd CRI mirrors, see https://github.com/kubernetes-sigs/kind/releases/tag/v0.20.0. It currently has a compatibility shim but it doesn't work in all cases. And the plan is that this will break upstream soon.

i filed a feature request here for a better api for this - https://github.com/kubernetes-sigs/kind/issues/3354 - but the root of the issue isn't even Kind, it's upstream containerd deprecations.

nicks avatar Nov 30 '23 17:11 nicks

Thanks for this @nicks - this is incredibly useful information, as I wasn't aware of the changes.

I've attempted to make the necessary changes, but it seems that insecure registries aren't supported (or I've made some misconfiguration): https://github.com/containerd/containerd/discussions/9454

iainsproat avatar Dec 01 '23 13:12 iainsproat

I resolved this by upgrading to v0.8.28 and making the following amendments:

In the cluster-config.yaml:

apiVersion: ctlptl.dev/v1alpha1
kind: Cluster
product: kind
name: kind-mycluster
kindV1Alpha4Cluster:
  containerdConfigPatches:
    - |-
      [plugins."io.containerd.grpc.v1.cri".registry]
        config_path = "/etc/containerd/certs.d"

And introducing a file at the following path, /etc/containerd/certs.d/example.internal:5001/hosts.toml:

server = "http://example.internal:5001"

[host."http://example.internal:5001"]
 capabilities = ["pull", "resolve", "push"]
 skip_verify = true

iainsproat avatar Apr 24 '24 16:04 iainsproat

An update for anyone reading this, it can now more simply be solved by using registryAuths, an example is here: https://github.com/tilt-dev/ctlptl/blob/main/examples/kind_registry_auth.yaml

iainsproat avatar Jul 11 '25 15:07 iainsproat