terraform-hcloud-kube-hetzner
                                
                                 terraform-hcloud-kube-hetzner copied to clipboard
                                
                                    terraform-hcloud-kube-hetzner copied to clipboard
                            
                            
                            
                        System-upgrade-controller fails to run
Description
Since a recent upgrade of the module, my system-upgrade-controller no longer manages to perform any upgrades. This is the error message:
level=error msg="error syncing 'system-upgrade/k3s-agent': handler system-upgrade-controller: Get \"https://update.k3s.io/v1-release/channels/v1.29\": tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied, requeuing"
The file does not exist on the host, instead there's only a ca-bundle.pem. I've recreated the system-upgrade-controller by tainting the kustomization resource but it's not fixing it. I wonder if my system images are simply too old and I need to recreate my control plane nodes?
Kube.tf file
module "cluster" {
  providers = {
    hcloud = hcloud
  }
  source  = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.13.5"
  hcloud_token    = var.hcloud_token
  ssh_public_key  = file(var.public_key)
  ssh_private_key = file(var.private_key)
  network_region  = "eu-central"
  extra_firewall_rules = [
    **
  ]
  k3s_global_kubelet_args = [
    "kube-reserved=cpu=100m,memory=200Mi,ephemeral-storage=1Gi", "system-reserved=cpu=100m,memory=200Mi", "image-gc-high-threshold=50",
    "image-gc-low-threshold=40", "allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.disable_ipv6"
  ]
  control_plane_nodepools = [
    {
      name        = "control-plane-fsn1",
      server_type = "cpx11",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 1
    },
    {
      name        = "control-plane-nbg1",
      server_type = "cpx11",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 1
    },
    {
      name        = "control-plane-hel1",
      server_type = "cpx11",
      location    = "hel1",
      labels      = [],
      taints      = [],
      count       = 1
    }
  ]
  agent_nodepools = [
    {
      name        = "ag0",
      server_type = "cpx31",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 2
    },
    {
      name        = "ag1",
      server_type = "cpx41",
      location    = "fsn1",
      labels      = ["*/designation=*"],
      taints      = ["node.kubernetes.io/role=*:NoSchedule"],
      count       = 1
    },
  ]
  load_balancer_type     = "lb11"
  load_balancer_location = "nbg1"
  automatically_upgrade_k3s = true
  enable_cert_manager       = true
  ingress_controller        = "none"
  cluster_name = "***"
}
Screenshots
No response
Platform
Mac
@jniebuhr Make sure to upgrade all packages with terraform init -upgrade and try again.
If that fails, please remove both extra_firewall_rules and k3s_global_kubelet_args and try again too.
If the above do not work too, you could also indeed try upgrading the node itself, see our readme.
Quick fix: run container privileged
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        image: rancher/system-upgrade-controller:v0.13.4
        name: system-upgrade-controller
        securityContext:
          privileged: true
@kossmac Super interesting tip. Did you have that issue too? So you can confirm that this solution work?
@M4t7e Do you think safety wise that's ok?
This issue seems rare enough, the fix above is interesting. Closing for now, will reopen if the issue gets more activity.