Hetzner's Rocky 8 Image Doesn't Include tar, Causes kops-configuration.service to Fail
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
Client version: 1.28.4 (git-v1.28.4)
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
kubernetesVersion: 1.28.6
But it doesn't matter because the nodes never actually unpack k8s.
3. What cloud provider are you using?
Hetzner
4. What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster \
--name=example.k8s.local \
--ssh-public-key=~/.ssh/id_rsa.pub \
--cloud=hetzner \
--zones=ash \
--image=rocky-8 \
--networking=calico \
--network-cidr=10.10.0.0/16 \
--node-size=cpx11 \
--control-plane-size=cpx11
kops update cluster example.k8s.local --yes
kops export kubeconfig example.k8s.local --admin
kops validate cluster --wait 10m
# Observe as resources are created and then the cluster never comes up
# Then ssh into the control plane (or a node, I guess) and see issues
ssh root@control-plane
journalctl -u kops-configuration
which tar
# Confusion from here on
5. What happened after the commands executed?
Nodes were spun up, but on the control plane, we get this:
May 04 01:03:21 control-plane-ash-799518db3544ab1d nodeup[1610]: W0504 01:03:21.778261 1610 main.go:133] got error running nodeup (will retry in 30s): error adding asset "f3a841324845ca6bf0d4091b4fc7f97e18a623172158b72fc3fdcdb9d42d2d37@https://storage.googleapis.com/k8s-artifacts-cni/release/v1.2.0/cni-plugins-linux-amd64-v1.2.0.tgz": error expanding asset file "/var/cache/nodeup/sha256:f3a841324845ca6bf0d4091b4fc7f97e18a623172158b72fc3fdcdb9d42d2d37_cni-plugins-linux-amd64-v1_2_0_tgz" exec: "tar": executable file not found in $PATH:
And indeed:
[root@control-plane-ash-799518db3544ab1d ~]# which tar
/usr/bin/which: no tar in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
6. What did you expect to happen?
The control plane to unpack the file and set itself up correctly.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "[REDACTED]"
name: [REDACTED]
spec:
api:
loadBalancer:
type: Public
authorization:
rbac: {}
channel: stable
cloudProvider: hetzner
configBase: s3://[REDACTED]
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- instanceGroup: control-plane-ash
name: h
manager:
backupRetentionDays: 90
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- instanceGroup: control-plane-ash
name: h
manager:
backupRetentionDays: 90
memoryRequest: 100Mi
name: events
iam:
allowContainerRegistry: true
legacy: false
kubelet:
anonymousAuth: false
kubernetesApiAccess:
- 0.0.0.0/0
- ::/0
kubernetesVersion: 1.28.6
networkCIDR: 10.10.0.0/16
networking:
calico: {}
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 0.0.0.0/0
- ::/0
subnets:
- name: ash
type: Public
zone: ash
topology:
dns:
type: None
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "[REDACTED]"
labels:
kops.k8s.io/cluster: [REDACTED]
name: control-plane-ash
spec:
image: rocky-8
machineType: cpx11
maxSize: 1
minSize: 1
role: Master
subnets:
- ash
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "[REDACTED]"
labels:
kops.k8s.io/cluster: [REDACTED]
name: nodes-ash
spec:
image: rocky-8
machineType: cpx11
maxSize: 3
minSize: 3
role: Node
subnets:
- ash
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
As this is a bug with cloud-init setup scripts (presumably), I've not included any output of a kops command here. The issue is dependencies not being installed correctly once the machines are given agency to set themselves up.
9. Anything else do we need to know?
I am so very, very confused as to why Hetzner's image doesn't include tar.
Thanks for reporting this @rehashedsalt. Could you try using the packages config option to install tar (not sure if the untar part runs first or not)? https://kops.sigs.k8s.io/instance_groups/#packages
No dice. additionalUserData with a cloud-init spec to install the package should work though since cloud-init installs kops-configuration.service as its last job.
Yes, additionalUserData will do it. I can't think of a better workaround for now.
I will look into moving the logic to pure Go, instead of calling the tar executable.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.