kops Hetzner's Rocky 8 Image Doesn't Include tar, Causes kops-configuration.service to Fail

/kind bug

1. What kops version are you running? The command kops version, will display this information.

Client version: 1.28.4 (git-v1.28.4)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

kubernetesVersion: 1.28.6

But it doesn't matter because the nodes never actually unpack k8s.

3. What cloud provider are you using?

Hetzner

4. What commands did you run? What is the simplest way to reproduce this issue?

kops create cluster \
    --name=example.k8s.local \
    --ssh-public-key=~/.ssh/id_rsa.pub \
    --cloud=hetzner \
    --zones=ash \
    --image=rocky-8 \
    --networking=calico \
    --network-cidr=10.10.0.0/16 \
    --node-size=cpx11 \
    --control-plane-size=cpx11
kops update cluster example.k8s.local --yes
kops export kubeconfig example.k8s.local --admin
kops validate cluster --wait 10m
# Observe as resources are created and then the cluster never comes up
# Then ssh into the control plane (or a node, I guess) and see issues
ssh root@control-plane
journalctl -u kops-configuration
which tar
# Confusion from here on

5. What happened after the commands executed?

Nodes were spun up, but on the control plane, we get this:

May 04 01:03:21 control-plane-ash-799518db3544ab1d nodeup[1610]: W0504 01:03:21.778261    1610 main.go:133] got error running nodeup (will retry in 30s): error adding asset "f3a841324845ca6bf0d4091b4fc7f97e18a623172158b72fc3fdcdb9d42d2d37@https://storage.googleapis.com/k8s-artifacts-cni/release/v1.2.0/cni-plugins-linux-amd64-v1.2.0.tgz": error expanding asset file "/var/cache/nodeup/sha256:f3a841324845ca6bf0d4091b4fc7f97e18a623172158b72fc3fdcdb9d42d2d37_cni-plugins-linux-amd64-v1_2_0_tgz" exec: "tar": executable file not found in $PATH:

And indeed:

[root@control-plane-ash-799518db3544ab1d ~]# which tar
/usr/bin/which: no tar in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)

6. What did you expect to happen?

The control plane to unpack the file and set itself up correctly.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "[REDACTED]"
  name: [REDACTED]
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: hetzner
  configBase: s3://[REDACTED]
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: control-plane-ash
      name: h
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: control-plane-ash
      name: h
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.28.6
  networkCIDR: 10.10.0.0/16
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - name: ash
    type: Public
    zone: ash
  topology:
    dns:
      type: None

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "[REDACTED]"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: control-plane-ash
spec:
  image: rocky-8
  machineType: cpx11
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - ash

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "[REDACTED]"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: nodes-ash
spec:
  image: rocky-8
  machineType: cpx11
  maxSize: 3
  minSize: 3
  role: Node
  subnets:
  - ash

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

As this is a bug with cloud-init setup scripts (presumably), I've not included any output of a kops command here. The issue is dependencies not being installed correctly once the machines are given agency to set themselves up.

9. Anything else do we need to know?

I am so very, very confused as to why Hetzner's image doesn't include tar.

May 04 '24 01:05 rehashedsalt

Thanks for reporting this @rehashedsalt. Could you try using the packages config option to install tar (not sure if the untar part runs first or not)? https://kops.sigs.k8s.io/instance_groups/#packages

May 04 '24 08:05 hakman

No dice. additionalUserData with a cloud-init spec to install the package should work though since cloud-init installs kops-configuration.service as its last job.

May 04 '24 15:05 rehashedsalt

Yes, additionalUserData will do it. I can't think of a better workaround for now. I will look into moving the logic to pure Go, instead of calling the tar executable.

May 05 '24 09:05 hakman

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 03 '24 09:08 k8s-triage-robot

/remove-lifecycle stale

Aug 03 '24 11:08 hakman

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 01 '24 12:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Dec 01 '24 13:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Dec 31 '24 13:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Dec 31 '24 13:12 k8s-ci-robot