terraform-module-k3s
terraform-module-k3s copied to clipboard
Windows Terraform - SSH authentication failed
Hi i try to create a k3s cluster on hetzner cloud with this terraform script, the script run in a timeout on connect the machine over ssh. I tried to manually create a server and assign the key so the key worked fine. But when I start the script, unfortunately it does not work where to look for the private key?
Error: timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Connecting to remote host via SSH... module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Host: XXX.XXX.XXX.XXX module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): User: root module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Password: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Private key: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Certificate: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): SSH Agent: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Checking Host Key: false
My windows commands to generate a key
# Change the Windows Service config for "OpenSSH Authentication Agent"
sc config "ssh-agent" start=delayed-auto
sc start "ssh-agent"
# Create a private/public key pair
ssh-keygen -t ecdsa -b 521 -f myKey
ssh-add myKey
I don't know why Terraform don't use your SSH agent :thinking: ... Just to be sure, your k3s instance are instantiated with your public key ?
Also, if I remember, SSH agent is only available with Pagent on Windows (cf. https://www.terraform.io/docs/language/resources/provisioners/connection.html#agent), so I don't understand why it works for other instances.
I'm sorry, I never use Terraform on Windows directly (only on WSL), so I don't know how to resolve this issue :(
I have try now with Pagent
i can connect with putty over pagent (without password) but it not work with terraform.
I am not sure if here should not be true
module.k3s.null_resource.k8s_ca_certificates_install[4] (remote-exec): SSH Agent: false
https://stackoverflow.com/a/58781305/6097503
I have installed an ubuntu on my windows machine with WSL2. But i have the same error...
I can connect with ssh
over keyfile with this linux machine to the linux server in the hetzner cloud.
Error: timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain
ssh-keygen
more /root/.ssh/id_rsa.pub
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=$(dpkg --print-architecture)] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
apt install terraform
git clone https://github.com/xunleii/terraform-module-k3s.git
cd terraform-module-k3s/examples/hcloud-k3s/
terraform init
terraform apply
I don't see why it is not working. As I see, your are using the given example.
Have you something like that when you use WSL ?
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Connecting to remote host via SSH... module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Host: XXX.XXX.XXX.XXX module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): User: root module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Password: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Private key: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Certificate: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): SSH Agent: true module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Checking Host Key: false
If yes, running the following commands can solve your problem ?
git clone https://github.com/xunleii/terraform-module-k3s.git
cd terraform-module-k3s/examples/hcloud-k3s/
ssh-keygen -f hcloud.id_rsa
ssh-add hcloud.id_rsa
terraform init
terraform apply --var ssh_key="$(cat hcloud.id_rsa.pub)"
Sorry if it tooks time to solve this problem, I never encounter this issue :disappointed:
Thanks for your support I won't be back in the office until Monday then I will review your suggestion.
First i have this error Could not open a connection to your authentication agent.
I execute this eval "$(ssh-agent)"
After that i have problems with the certificates
module.k3s.null_resource.agents_label["k3s-agent-0_node|node.kubernetes.io/pool"]: Creation complete after 13s [id=4521186966452460074]
module.k3s.null_resource.agents_label["k3s-agent-1_node|node.kubernetes.io/pool"] (remote-exec): node/k3s-agent-1 labeled
module.k3s.null_resource.agents_label["k3s-agent-2_node|node.kubernetes.io/pool"] (remote-exec): node/k3s-agent-2 labeled
module.k3s.null_resource.agents_label["k3s-agent-1_node|node.kubernetes.io/pool"]: Creation complete after 14s [id=495787829675262765]
module.k3s.null_resource.agents_label["k3s-agent-2_node|node.kubernetes.io/pool"]: Creation complete after 14s [id=8595809587793038789]
module.k3s.null_resource.kubernetes_ready: Creating...
module.k3s.null_resource.kubernetes_ready: Creation complete after 0s [id=8453258558289144403]
kubernetes_service_account.bootstrap: Creating...
kubernetes_cluster_role_binding.boostrap: Creating...
Error: Post "https://XX.XX.XX.XX:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings": x509: certificate signed by unknown authority
Error: Post "https://XX.XX.XX.XX:6443/api/v1/namespaces/default/serviceaccounts": x509: certificate signed by unknown authority
Thanks for your help. Firstly, good news, your cluster is provisioned. But in fact, this certificates error is really weird and should not occurs. Have you change something on any example/hcloud-k3s
?
Can you try this inside the directory example/hcloud-k3s
?
# add output with the generated kubeconfig
cat <<EOF >> outputs.tf
output "kubeconfig" {
value = module.k3s.kube_config
}
EOF
# generate the kubeconfig
terraform output kubeconfig --raw > kubeconfig
# test if you can access on your side
KUBECONFIG=./kubeconfig kubectl version
I think, you will have the same certificates problem. If it doesn't work, can you compare the certificates inside the generated kubeconfig
and the ones present in /etc/rancher/k3s/k3s.yaml
on a control-plane node ? They must be differs.
Hi, i have reset my wsl ubuntu container and also create a new clean project in hetzner cloud. Now the terraform setup is completed without a failure.
No i have try to add the new cluster to my rancher, but it is always on pending.
kubectl get events
root@k3s-control-plane-0:~# kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
25m Normal Starting node/k3s-agent-0 Starting kubelet.
25m Warning InvalidDiskCapacity node/k3s-agent-0 invalid capacity 0 on image filesystem
25m Normal NodeHasSufficientMemory node/k3s-agent-0 Node k3s-agent-0 status is now: NodeHasSufficientMemory
25m Normal NodeHasNoDiskPressure node/k3s-agent-0 Node k3s-agent-0 status is now: NodeHasNoDiskPressure
25m Normal NodeHasSufficientPID node/k3s-agent-0 Node k3s-agent-0 status is now: NodeHasSufficientPID
25m Normal NodeAllocatableEnforced node/k3s-agent-0 Updated Node Allocatable limit across pods
25m Normal Starting node/k3s-agent-0 Starting kube-proxy.
25m Normal NodeReady node/k3s-agent-0 Node k3s-agent-0 status is now: NodeReady
25m Normal RegisteredNode node/k3s-agent-0 Node k3s-agent-0 event: Registered Node k3s-agent-0 in Controller
25m Normal Starting node/k3s-agent-1 Starting kubelet.
25m Warning InvalidDiskCapacity node/k3s-agent-1 invalid capacity 0 on image filesystem
25m Normal NodeHasSufficientMemory node/k3s-agent-1 Node k3s-agent-1 status is now: NodeHasSufficientMemory
25m Normal NodeHasNoDiskPressure node/k3s-agent-1 Node k3s-agent-1 status is now: NodeHasNoDiskPressure
25m Normal NodeHasSufficientPID node/k3s-agent-1 Node k3s-agent-1 status is now: NodeHasSufficientPID
25m Normal NodeAllocatableEnforced node/k3s-agent-1 Updated Node Allocatable limit across pods
25m Normal Starting node/k3s-agent-1 Starting kube-proxy.
25m Normal NodeReady node/k3s-agent-1 Node k3s-agent-1 status is now: NodeReady
25m Normal RegisteredNode node/k3s-agent-1 Node k3s-agent-1 event: Registered Node k3s-agent-1 in Controller
25m Normal Starting node/k3s-agent-2 Starting kubelet.
25m Warning InvalidDiskCapacity node/k3s-agent-2 invalid capacity 0 on image filesystem
25m Normal NodeHasSufficientMemory node/k3s-agent-2 Node k3s-agent-2 status is now: NodeHasSufficientMemory
25m Normal NodeHasNoDiskPressure node/k3s-agent-2 Node k3s-agent-2 status is now: NodeHasNoDiskPressure
25m Normal NodeHasSufficientPID node/k3s-agent-2 Node k3s-agent-2 status is now: NodeHasSufficientPID
25m Normal NodeAllocatableEnforced node/k3s-agent-2 Updated Node Allocatable limit across pods
25m Normal Starting node/k3s-agent-2 Starting kube-proxy.
25m Normal NodeReady node/k3s-agent-2 Node k3s-agent-2 status is now: NodeReady
25m Normal RegisteredNode node/k3s-agent-2 Node k3s-agent-2 event: Registered Node k3s-agent-2 in Controller
26m Normal Starting node/k3s-control-plane-0 Starting kubelet.
26m Warning InvalidDiskCapacity node/k3s-control-plane-0 invalid capacity 0 on image filesystem
26m Normal NodeHasSufficientMemory node/k3s-control-plane-0 Node k3s-control-plane-0 status is now: NodeHasSufficientMemory
26m Normal NodeHasNoDiskPressure node/k3s-control-plane-0 Node k3s-control-plane-0 status is now: NodeHasNoDiskPressure
26m Normal NodeHasSufficientPID node/k3s-control-plane-0 Node k3s-control-plane-0 status is now: NodeHasSufficientPID
26m Normal NodeAllocatableEnforced node/k3s-control-plane-0 Updated Node Allocatable limit across pods
26m Normal Starting node/k3s-control-plane-0 Starting kube-proxy.
25m Normal NodeReady node/k3s-control-plane-0 Node k3s-control-plane-0 status is now: NodeReady
25m Normal RegisteredNode node/k3s-control-plane-0 Node k3s-control-plane-0 event: Registered Node k3s-control-plane-0 in Controller
25m Normal Starting node/k3s-control-plane-1 Starting kubelet.
25m Warning InvalidDiskCapacity node/k3s-control-plane-1 invalid capacity 0 on image filesystem
25m Normal NodeHasSufficientMemory node/k3s-control-plane-1 Node k3s-control-plane-1 status is now: NodeHasSufficientMemory
25m Normal NodeHasNoDiskPressure node/k3s-control-plane-1 Node k3s-control-plane-1 status is now: NodeHasNoDiskPressure
25m Normal NodeHasSufficientPID node/k3s-control-plane-1 Node k3s-control-plane-1 status is now: NodeHasSufficientPID
25m Normal NodeAllocatableEnforced node/k3s-control-plane-1 Updated Node Allocatable limit across pods
25m Normal Starting node/k3s-control-plane-1 Starting kube-proxy.
25m Normal NodeReady node/k3s-control-plane-1 Node k3s-control-plane-1 status is now: NodeReady
25m Normal RegisteredNode node/k3s-control-plane-1 Node k3s-control-plane-1 event: Registered Node k3s-control-plane-1 in Controller
25m Normal Starting node/k3s-control-plane-2 Starting kubelet.
25m Warning InvalidDiskCapacity node/k3s-control-plane-2 invalid capacity 0 on image filesystem
25m Normal NodeHasSufficientMemory node/k3s-control-plane-2 Node k3s-control-plane-2 status is now: NodeHasSufficientMemory
25m Normal NodeHasNoDiskPressure node/k3s-control-plane-2 Node k3s-control-plane-2 status is now: NodeHasNoDiskPressure
25m Normal NodeHasSufficientPID node/k3s-control-plane-2 Node k3s-control-plane-2 status is now: NodeHasSufficientPID
25m Normal NodeAllocatableEnforced node/k3s-control-plane-2 Updated Node Allocatable limit across pods
25m Normal Starting node/k3s-control-plane-2 Starting kube-proxy.
25m Normal NodeReady node/k3s-control-plane-2 Node k3s-control-plane-2 status is now: NodeReady
25m Normal RegisteredNode node/k3s-control-plane-2 Node k3s-control-plane-2 event: Registered Node k3s-control-plane-2 in Controller
kubectl get pods --show-labels --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE LABELS
cattle-system cattle-cluster-agent-867b645bf4-852ll 0/1 Pending 0 17m app=cattle-cluster-agent,pod-template-hash=867b645bf4
kube-system coredns-854c77959c-8jmp2 0/1 Pending 0 19m k8s-app=kube-dns,pod-template-hash=854c77959c
kube-system helm-install-traefik-qwl78 0/1 Pending 0 19m controller-uid=d4d0cd35-6752-4089-9385-5f192a34d47c,helmcharts.helm.cattle.io/chart=traefik,job-name=helm-install-traefik
kube-system local-path-provisioner-7c458769fb-l2g5h 0/1 Pending 0 19m app=local-path-provisioner,pod-template-hash=7c458769fb
kube-system metrics-server-86cbb8457f-cc9jk 0/1 Pending 0 19m k8s-app=metrics-server,pod-template-hash=86cbb8457f
kubectl --namespace=kube-system describe pod helm-install-traefik-qwl78
root@k3s-control-plane-0:~# kubectl --namespace=kube-system describe pod helm-install-traefik-qwl78
Name: helm-install-traefik-qwl78
Namespace: kube-system
Priority: 0
Node: <none>
Labels: controller-uid=d4d0cd35-6752-4089-9385-5f192a34d47c
helmcharts.helm.cattle.io/chart=traefik
job-name=helm-install-traefik
Annotations: helmcharts.helm.cattle.io/configHash: SHA256=1155364EEC7C9D81A413F9E187ED8628CD250E20343E081F0FB08A8BB4E101CD
Status: Pending
IP:
IPs: <none>
Controlled By: Job/helm-install-traefik
Containers:
helm:
Image: rancher/klipper-helm:v0.4.3
Port: <none>
Host Port: <none>
Args:
install
Environment:
NAME: traefik
VERSION:
REPO:
HELM_DRIVER: secret
CHART_NAMESPACE: kube-system
CHART: https://%{KUBERNETES_API}%/static/charts/traefik-1.81.0.tgz
HELM_VERSION:
TARGET_NAMESPACE: kube-system
NO_PROXY: .svc,.cluster.local,10.42.0.0/16,10.43.0.0/16
Mounts:
/chart from content (rw)
/config from values (rw)
/var/run/secrets/kubernetes.io/serviceaccount from helm-traefik-token-hv62r (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
values:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: chart-values-traefik
Optional: false
content:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: chart-content-traefik
Optional: false
helm-traefik-token-hv62r:
Type: Secret (a volume populated by a Secret)
SecretName: helm-traefik-token-hv62r
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18m default-scheduler 0/2 nodes are available: 2 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
Warning FailedScheduling 18m default-scheduler 0/2 nodes are available: 2 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
Warning FailedScheduling 17m default-scheduler 0/3 nodes are available: 3 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
Warning FailedScheduling 17m default-scheduler 0/6 nodes are available: 1 node(s) had taint {dedicated: gpu}, that the pod didn't tolerate, 5 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
I'm sorry for that, this behavior is "expected" but not documented.
All your nodes are currently uninitialized
(which is describe by the taint node.cloudprovider.kubernetes.io/uninitialized: true
), because they need the hcloud-cloud-controller-manager. (see https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/ for more documentation about this subject)
This is due to the k3s flag --kubelet-arg cloud-provider=external
(cf. https://github.com/xunleii/terraform-module-k3s/blob/master/examples/hcloud-k3s/k3s.tf#L16). In order to fix that, you have two choices:
- using the
cloud-controller-manager
from hetzner (I recommend it, because it gaves you the ability to annotate your node with some usefullabels
likefailure-domain.beta.kubernetes.io/region
orfailure-domain.beta.kubernetes.io/zone
, which can be used for high-availability applications on several availability zone). Be careful: this controller must only be used if your nodes are hosted on hetzner cloud. If you use another cloud provider or something else (vsphere for example), you must use the rightcloud-controller-manager
. - removing the flags from the list.
EDIT: also, the hcloud-k3s
is an example, and I do not recommend using it as is; for example, one node has the taint dedicated: gpu
Okay I see. I'm probably a born tester for this project. 🤪 Thanks for your quick feedback and great support.
Maybe you could publish another example that generates a normal cluster that can be used in Rachner without further ado - that would be really great. Of course I would like to test it again.
All testers are welcome :wink: Thanks too for your responses and for this issue; I need to write more documentation in order to make this project more easier to use and to debug
Also, I will try to add an example with a very simple cluster, on a less "exotic" provider (like GCP or AWS)