terraform-module-k3s icon indicating copy to clipboard operation
terraform-module-k3s copied to clipboard

Windows Terraform - SSH authentication failed

Open tinohager opened this issue 4 years ago • 11 comments

Hi i try to create a k3s cluster on hetzner cloud with this terraform script, the script run in a timeout on connect the machine over ssh. I tried to manually create a server and assign the key so the key worked fine. But when I start the script, unfortunately it does not work where to look for the private key?

Error: timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain

module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Connecting to remote host via SSH... module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Host: XXX.XXX.XXX.XXX module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): User: root module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Password: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Private key: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Certificate: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): SSH Agent: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Checking Host Key: false

My windows commands to generate a key

# Change the Windows Service config for "OpenSSH Authentication Agent"
sc config "ssh-agent" start=delayed-auto
sc start "ssh-agent"

# Create a private/public key pair
ssh-keygen -t ecdsa -b 521 -f myKey

ssh-add myKey

tinohager avatar Feb 09 '21 13:02 tinohager

I don't know why Terraform don't use your SSH agent :thinking: ... Just to be sure, your k3s instance are instantiated with your public key ?

Also, if I remember, SSH agent is only available with Pagent on Windows (cf. https://www.terraform.io/docs/language/resources/provisioners/connection.html#agent), so I don't understand why it works for other instances.

I'm sorry, I never use Terraform on Windows directly (only on WSL), so I don't know how to resolve this issue :(

xunleii avatar Feb 10 '21 12:02 xunleii

I have try now with Pagent i can connect with putty over pagent (without password) but it not work with terraform.

I am not sure if here should not be true

module.k3s.null_resource.k8s_ca_certificates_install[4] (remote-exec): SSH Agent: false

https://stackoverflow.com/a/58781305/6097503

tinohager avatar Feb 10 '21 13:02 tinohager

I have installed an ubuntu on my windows machine with WSL2. But i have the same error... I can connect with ssh over keyfile with this linux machine to the linux server in the hetzner cloud.

Error: timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain

ssh-keygen
more /root/.ssh/id_rsa.pub

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=$(dpkg --print-architecture)] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
apt install terraform
git clone https://github.com/xunleii/terraform-module-k3s.git
cd terraform-module-k3s/examples/hcloud-k3s/
terraform init
terraform apply

tinohager avatar Feb 10 '21 14:02 tinohager

I don't see why it is not working. As I see, your are using the given example.

Have you something like that when you use WSL ?

module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Connecting to remote host via SSH... module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Host: XXX.XXX.XXX.XXX module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): User: root module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Password: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Private key: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Certificate: false module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): SSH Agent: true module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Checking Host Key: false

If yes, running the following commands can solve your problem ?

git clone https://github.com/xunleii/terraform-module-k3s.git
cd terraform-module-k3s/examples/hcloud-k3s/

ssh-keygen -f hcloud.id_rsa
ssh-add hcloud.id_rsa
terraform init
terraform apply --var ssh_key="$(cat hcloud.id_rsa.pub)"

Sorry if it tooks time to solve this problem, I never encounter this issue :disappointed:

xunleii avatar Feb 12 '21 17:02 xunleii

Thanks for your support I won't be back in the office until Monday then I will review your suggestion.

tinohager avatar Feb 12 '21 17:02 tinohager

First i have this error Could not open a connection to your authentication agent. I execute this eval "$(ssh-agent)"

After that i have problems with the certificates

module.k3s.null_resource.agents_label["k3s-agent-0_node|node.kubernetes.io/pool"]: Creation complete after 13s [id=4521186966452460074]
module.k3s.null_resource.agents_label["k3s-agent-1_node|node.kubernetes.io/pool"] (remote-exec): node/k3s-agent-1 labeled
module.k3s.null_resource.agents_label["k3s-agent-2_node|node.kubernetes.io/pool"] (remote-exec): node/k3s-agent-2 labeled
module.k3s.null_resource.agents_label["k3s-agent-1_node|node.kubernetes.io/pool"]: Creation complete after 14s [id=495787829675262765]
module.k3s.null_resource.agents_label["k3s-agent-2_node|node.kubernetes.io/pool"]: Creation complete after 14s [id=8595809587793038789]
module.k3s.null_resource.kubernetes_ready: Creating...
module.k3s.null_resource.kubernetes_ready: Creation complete after 0s [id=8453258558289144403]
kubernetes_service_account.bootstrap: Creating...
kubernetes_cluster_role_binding.boostrap: Creating...

Error: Post "https://XX.XX.XX.XX:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings": x509: certificate signed by unknown authority

Error: Post "https://XX.XX.XX.XX:6443/api/v1/namespaces/default/serviceaccounts": x509: certificate signed by unknown authority

tinohager avatar Feb 18 '21 13:02 tinohager

Thanks for your help. Firstly, good news, your cluster is provisioned. But in fact, this certificates error is really weird and should not occurs. Have you change something on any example/hcloud-k3s ?

Can you try this inside the directory example/hcloud-k3s ?

# add output with the generated kubeconfig
cat <<EOF >> outputs.tf

output "kubeconfig" {
  value = module.k3s.kube_config
}
EOF

# generate the kubeconfig
terraform output kubeconfig --raw > kubeconfig

# test if you can access on your side
KUBECONFIG=./kubeconfig kubectl version

I think, you will have the same certificates problem. If it doesn't work, can you compare the certificates inside the generated kubeconfig and the ones present in /etc/rancher/k3s/k3s.yaml on a control-plane node ? They must be differs.

xunleii avatar Feb 18 '21 20:02 xunleii

Hi, i have reset my wsl ubuntu container and also create a new clean project in hetzner cloud. Now the terraform setup is completed without a failure.

No i have try to add the new cluster to my rancher, but it is always on pending.

kubectl get events

root@k3s-control-plane-0:~# kubectl get events
LAST SEEN   TYPE      REASON                    OBJECT                     MESSAGE
25m         Normal    Starting                  node/k3s-agent-0           Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-agent-0           invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-agent-0           Node k3s-agent-0 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-agent-0           Node k3s-agent-0 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-agent-0           Node k3s-agent-0 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-agent-0           Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-agent-0           Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-agent-0           Node k3s-agent-0 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-agent-0           Node k3s-agent-0 event: Registered Node k3s-agent-0 in Controller
25m         Normal    Starting                  node/k3s-agent-1           Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-agent-1           invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-agent-1           Node k3s-agent-1 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-agent-1           Node k3s-agent-1 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-agent-1           Node k3s-agent-1 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-agent-1           Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-agent-1           Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-agent-1           Node k3s-agent-1 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-agent-1           Node k3s-agent-1 event: Registered Node k3s-agent-1 in Controller
25m         Normal    Starting                  node/k3s-agent-2           Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-agent-2           invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-agent-2           Node k3s-agent-2 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-agent-2           Node k3s-agent-2 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-agent-2           Node k3s-agent-2 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-agent-2           Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-agent-2           Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-agent-2           Node k3s-agent-2 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-agent-2           Node k3s-agent-2 event: Registered Node k3s-agent-2 in Controller
26m         Normal    Starting                  node/k3s-control-plane-0   Starting kubelet.
26m         Warning   InvalidDiskCapacity       node/k3s-control-plane-0   invalid capacity 0 on image filesystem
26m         Normal    NodeHasSufficientMemory   node/k3s-control-plane-0   Node k3s-control-plane-0 status is now: NodeHasSufficientMemory
26m         Normal    NodeHasNoDiskPressure     node/k3s-control-plane-0   Node k3s-control-plane-0 status is now: NodeHasNoDiskPressure
26m         Normal    NodeHasSufficientPID      node/k3s-control-plane-0   Node k3s-control-plane-0 status is now: NodeHasSufficientPID
26m         Normal    NodeAllocatableEnforced   node/k3s-control-plane-0   Updated Node Allocatable limit across pods
26m         Normal    Starting                  node/k3s-control-plane-0   Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-control-plane-0   Node k3s-control-plane-0 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-control-plane-0   Node k3s-control-plane-0 event: Registered Node k3s-control-plane-0 in Controller
25m         Normal    Starting                  node/k3s-control-plane-1   Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-control-plane-1   invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-control-plane-1   Node k3s-control-plane-1 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-control-plane-1   Node k3s-control-plane-1 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-control-plane-1   Node k3s-control-plane-1 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-control-plane-1   Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-control-plane-1   Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-control-plane-1   Node k3s-control-plane-1 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-control-plane-1   Node k3s-control-plane-1 event: Registered Node k3s-control-plane-1 in Controller
25m         Normal    Starting                  node/k3s-control-plane-2   Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-control-plane-2   invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-control-plane-2   Node k3s-control-plane-2 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-control-plane-2   Node k3s-control-plane-2 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-control-plane-2   Node k3s-control-plane-2 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-control-plane-2   Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-control-plane-2   Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-control-plane-2   Node k3s-control-plane-2 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-control-plane-2   Node k3s-control-plane-2 event: Registered Node k3s-control-plane-2 in Controller

kubectl get pods --show-labels --all-namespaces

NAMESPACE       NAME                                      READY   STATUS    RESTARTS   AGE   LABELS
cattle-system   cattle-cluster-agent-867b645bf4-852ll     0/1     Pending   0          17m   app=cattle-cluster-agent,pod-template-hash=867b645bf4
kube-system     coredns-854c77959c-8jmp2                  0/1     Pending   0          19m   k8s-app=kube-dns,pod-template-hash=854c77959c
kube-system     helm-install-traefik-qwl78                0/1     Pending   0          19m   controller-uid=d4d0cd35-6752-4089-9385-5f192a34d47c,helmcharts.helm.cattle.io/chart=traefik,job-name=helm-install-traefik
kube-system     local-path-provisioner-7c458769fb-l2g5h   0/1     Pending   0          19m   app=local-path-provisioner,pod-template-hash=7c458769fb
kube-system     metrics-server-86cbb8457f-cc9jk           0/1     Pending   0          19m   k8s-app=metrics-server,pod-template-hash=86cbb8457f

kubectl --namespace=kube-system describe pod helm-install-traefik-qwl78

root@k3s-control-plane-0:~# kubectl --namespace=kube-system describe pod helm-install-traefik-qwl78
Name:           helm-install-traefik-qwl78
Namespace:      kube-system
Priority:       0
Node:           <none>
Labels:         controller-uid=d4d0cd35-6752-4089-9385-5f192a34d47c
                helmcharts.helm.cattle.io/chart=traefik
                job-name=helm-install-traefik
Annotations:    helmcharts.helm.cattle.io/configHash: SHA256=1155364EEC7C9D81A413F9E187ED8628CD250E20343E081F0FB08A8BB4E101CD
Status:         Pending
IP:
IPs:            <none>
Controlled By:  Job/helm-install-traefik
Containers:
  helm:
    Image:      rancher/klipper-helm:v0.4.3
    Port:       <none>
    Host Port:  <none>
    Args:
      install
    Environment:
      NAME:              traefik
      VERSION:
      REPO:
      HELM_DRIVER:       secret
      CHART_NAMESPACE:   kube-system
      CHART:             https://%{KUBERNETES_API}%/static/charts/traefik-1.81.0.tgz
      HELM_VERSION:
      TARGET_NAMESPACE:  kube-system
      NO_PROXY:          .svc,.cluster.local,10.42.0.0/16,10.43.0.0/16
    Mounts:
      /chart from content (rw)
      /config from values (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from helm-traefik-token-hv62r (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  values:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-values-traefik
    Optional:  false
  content:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-content-traefik
    Optional:  false
  helm-traefik-token-hv62r:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  helm-traefik-token-hv62r
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  18m   default-scheduler  0/2 nodes are available: 2 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  18m   default-scheduler  0/2 nodes are available: 2 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  17m   default-scheduler  0/3 nodes are available: 3 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  17m   default-scheduler  0/6 nodes are available: 1 node(s) had taint {dedicated: gpu}, that the pod didn't tolerate, 5 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.

tinohager avatar Feb 19 '21 09:02 tinohager

I'm sorry for that, this behavior is "expected" but not documented. All your nodes are currently uninitialized (which is describe by the taint node.cloudprovider.kubernetes.io/uninitialized: true), because they need the hcloud-cloud-controller-manager. (see https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/ for more documentation about this subject)

This is due to the k3s flag --kubelet-arg cloud-provider=external (cf. https://github.com/xunleii/terraform-module-k3s/blob/master/examples/hcloud-k3s/k3s.tf#L16). In order to fix that, you have two choices:

  • using the cloud-controller-manager from hetzner (I recommend it, because it gaves you the ability to annotate your node with some useful labels like failure-domain.beta.kubernetes.io/region or failure-domain.beta.kubernetes.io/zone, which can be used for high-availability applications on several availability zone). Be careful: this controller must only be used if your nodes are hosted on hetzner cloud. If you use another cloud provider or something else (vsphere for example), you must use the right cloud-controller-manager.
  • removing the flags from the list.

EDIT: also, the hcloud-k3s is an example, and I do not recommend using it as is; for example, one node has the taint dedicated: gpu

xunleii avatar Feb 19 '21 17:02 xunleii

Okay I see. I'm probably a born tester for this project. 🤪 Thanks for your quick feedback and great support.

Maybe you could publish another example that generates a normal cluster that can be used in Rachner without further ado - that would be really great. Of course I would like to test it again.

tinohager avatar Feb 19 '21 18:02 tinohager

All testers are welcome :wink: Thanks too for your responses and for this issue; I need to write more documentation in order to make this project more easier to use and to debug

Also, I will try to add an example with a very simple cluster, on a less "exotic" provider (like GCP or AWS)

xunleii avatar Feb 19 '21 18:02 xunleii