docker-machine-driver-hetzner icon indicating copy to clipboard operation
docker-machine-driver-hetzner copied to clipboard

Bug: Using the via terraform deployed template while provisioning a cluster leads to Error

Open KittlitzMichael opened this issue 1 year ago • 7 comments

Error-Message in Rancher UI: [cmdCreateInner] error setting machine configuration from flags provided: --hetzner-image and --hetzner-image-id are mutually exclusive:Timeout waiting for ssh key

The terraform-script:

` resource "rancher2_node_template" "hetzner_create_template" { provider = rancher2.admin name = "hetzner-node-default-template" description = "Default template to acquire Hetzner Cloud Nodes" driver_id = rancher2_node_driver.hetzner_node_driver.id hetzner_config { api_token = var.hcloud_api_token image = var.base_image server_type = var.worker_type server_location = var.location networks = hcloud_network.kubenet.ip_range userdata = <<EOF packages:

  • ufw
  • fail2ban package_update: true package_upgrade: true runcmd:
  • sed -i 's/[#]*PermitRootLogin yes/PermitRootLogin prohibit-password/g' /etc/ssh/sshd_config
  • sed -i 's/[#]*PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
  • systemctl restart sshd
  • ufw allow proto tcp from any to any port 22
  • ufw allow from 10.43.0.0/16
  • ufw allow from 10.42.0.0/16
  • ufw allow from 10.0.0.0/16
  • ufw allow from 10.112.0.0/16
  • ufw allow from 10.244.0.0/16
  • ufw -f default deny incoming
  • ufw -f default allow outgoing
  • ufw -f enable EOF
    } #BUG -> https://github.com/JonasProgrammer/docker-machine-driver-hetzner/issues/85 #additional_keys = data.hcloud_ssh_keys.ssh_keys_master.ssh_keys.*.name
    #key-label = "terraform=rancher" } `

KittlitzMichael avatar Aug 06 '22 09:08 KittlitzMichael

The mutual exclusion logic is rather funky and still is there as it was in the archaic Cloud beta times.

Could you please try with those manual build artifacts?

JonasProgrammer avatar Aug 06 '22 10:08 JonasProgrammer

Good morning Jonas,

that error seemed to be successfully solved, but it's not!

First the following issues occured:

a) The hetzner-Logo is missing on the "Add Cluster"- Dialogue via UI

b) Error creating machine: Error in driver during machine creation: network '10.112.0.0/16' not found -> the Network is setup on hetzner though and working on the 3 Rancher2-Servers allready.

c) Loading the Hetzner-Driver via Rancher UI shows a wrong Image -> Cent OS 7 instead of the correctly set ubuntu - Version, which is correctly shown on the API-Edit-Output: { "annotations": { "ownerBindingsCreated": "true" }, "baseType": "nodeTemplate", "created": "2022-08-08T08:56:06Z", "createdTS": 1659948966000, "creatorId": "user-knh8f", "description": "Default template to acquire Hetzner Cloud Nodes", "driver": "hetzner", "hetznerConfig": { "apiToken": "NhdA9CP9VhZHwd1tDWtsI3jOkGvks83cU4915oxdcuIbYeOpWNfAvU0wMKjKfFeE", "autoSpread": false, "disablePublic": false, "disablePublic4": false, "disablePublic6": false, "existingKeyId": "0", "existingKeyPath": "", "image": "ubuntu-20.04", "imageId": "0", "networks": [ "10.112.0.0/16" ], ...

====

I then deleted the via Terroform applied configuration and set it up manualy without the network, but with the korrekt SSH-Keys (which are stored in hetznerConfig": { "additionalKey": [ 4 items "ssh-rsa ... correctly via UI.

But I get this error again: [cmdCreateInner] error setting machine configuration from flags provided: --hetzner-image and --hetzner-image-id are mutually exclusive:Timeout waiting for ssh key

KittlitzMichael avatar Aug 08 '22 09:08 KittlitzMichael

Now that is really strange... ImageID is only ever assigned once (from the flag value in SetConfigFromFlags) and the check runs on it being non-zero -- which should not be the case from the output you provided.

As stated before, I'm not familiar with Terraform, but is there any chance you could get an equivalent of docker-machine's --debug flag set? If so, I could add some more debug statements on the branch and we can see what is actually passed.

JonasProgrammer avatar Aug 08 '22 17:08 JonasProgrammer

Hi Jonas,

I did setup the Cluster again from the scratch and have the following findings:

  1. Your driver can be deployed without problems via Rancher2 and Terraform.

But the Templates make problems. I setup one via Rancher2/Terraform and one manualy. The one deployed with Rancher2/Terraform does not allow the ssh-keys to be installed as we know already. And the error about "mutual exclusiv" does happen in the settings with your driver.

The difference of the manually installed and the deployed one can be found via the API:

ERROR (automatically deployed) { "annotations": { "ownerBindingsCreated": "true" }, "baseType": "nodeTemplate", "cloudCredentialId": null, "created": "2022-08-10T05:21:58Z", "createdTS": 1660108918000, "creatorId": "user-mt96c", "description": "Default template to acquire Hetzner Cloud Nodes", "driver": "hetzner", "hetznerConfig": { "additionalKey": [ 0 items ], "apiToken": "***WORKING-TOKEN***", "autoSpread": false, "disablePublic": false, "disablePublic4": false, "disablePublic6": false, "existingKeyId": "0", "existingKeyPath": "", "firewalls": [ ], "image": "ubuntu-20.04", "imageId": "15512617", "networks": [ "10.112.0.0/16" ], "placementGroup": "", "serverLabel": [ "cattle.io/creator=norman" ], "serverLocation": "nbg1", "serverType": "cx41", "sshPort": "22", "sshUser": "root", "usePrivateNetwork": false, "userData": "packages:\r\n - ufw\r\n - fail2ban\r\npackage_update: true\r\npackage_upgrade: true\r\nruncmd:\r\n - sed -i 's/[#]*PermitRootLogin yes/PermitRootLogin prohibit-password/g' /etc/ssh/sshd_config\r\n - sed -i 's/[#]*PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config\r\n - systemctl restart sshd\r\n - ufw allow proto tcp from any to any port 22\r\n - ufw allow from 10.43.0.0/16\r\n - ufw allow from 10.42.0.0/16\r\n - ufw allow from 10.0.0.0/16\r\n - ufw allow from 10.112.0.0/16\r\n - ufw allow from 10.244.0.0/16\r\n - ufw -f default deny incoming\r\n - ufw -f default allow outgoing\r\n - ufw -f enable" }, "id": "cattle-global-nt:nt-jsggc", "labels": { "cattle.io/creator": "norman" }, "links": { "nodePools": "…/v3/nodePools?nodeTemplateId=cattle-global-nt%3Ant-jsggc", "nodes": "…/v3/nodes?nodeTemplateId=cattle-global-nt%3Ant-jsggc", "remove": "…/v3/nodeTemplates/cattle-global-nt:nt-jsggc", "self": "…/v3/nodeTemplates/cattle-global-nt:nt-jsggc", "update": "…/v3/nodeTemplates/cattle-global-nt:nt-jsggc" }, "name": "hetzner-node-default-template", "principalId": "local://user-mt96c", "state": "active", "transitioning": "no", "transitioningMessage": "", "type": "nodeTemplate", "useInternalIpAddress": true, "uuid": "37de086f-c7de-4108-8b5e-35413b9a9e0b" }

WORKING BUT WITH AN ERROR TOO (!) { "amazonec2Config": null, "annotations": { "ownerBindingsCreated": "true" }, "baseType": "nodeTemplate", "cloudCredentialId": null, "created": "2022-08-10T13:07:45Z", "createdTS": 1660136865000, "creatorId": "user-mt96c", "driver": "hetzner", "engineInstallURL": "https://releases.rancher.com/install-docker/20.10.sh", "engineRegistryMirror": [ ], "hetznerConfig": { "additionalKey": [ 7 items "ssh-rsa AAA***KEY1***", "ssh-ed25519 AAAAC3N***KEY2***", "ssh-ed25519 AAAAC3N***KEY3***", "ssh-ed25519 AAAAC3N***KEY4***" ], "apiToken": "***WORKING-TOKEN***", "autoSpread": false, "disablePublic": false, "disablePublic4": false, "disablePublic6": false, "existingKeyId": "0", "existingKeyPath": "", "firewalls": [ ], "image": "ubuntu-18.04", "imageId": "15512617", "networks": [ "1912928" ], "placementGroup": "", "serverLabel": [ ], "serverLocation": "nbg1", "serverType": "cx41", "sshPort": "22", "sshUser": "root", "usePrivateNetwork": false, "userData": "packages:\n - ufw\n - fail2ban\npackage_update: true\npackage_upgrade: true\nruncmd:\n - sed -i 's/[#]*PermitRootLogin yes/PermitRootLogin prohibit-password/g' /etc/ssh/sshd_config\n - sed -i 's/[#]*PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config\n - systemctl restart sshd\n - ufw allow proto tcp from any to any port 22\n - ufw allow from 10.43.0.0/16\n - ufw allow from 10.42.0.0/16\n - ufw allow from 10.0.0.0/16\n - ufw allow from 10.112.0.0/16\n - ufw allow from 10.244.0.0/16\n - ufw -f default deny incoming\n - ufw -f default allow outgoing\n - ufw -f enable" }, "id": "cattle-global-nt:nt-lw7ns", "labels": { "cattle.io/creator": "norman" }, "links": { "nodePools": "…/v3/nodePools?nodeTemplateId=cattle-global-nt%3Ant-lw7ns", "nodes": "…/v3/nodes?nodeTemplateId=cattle-global-nt%3Ant-lw7ns", "self": "…/v3/nodeTemplates/cattle-global-nt:nt-lw7ns", "update": "…/v3/nodeTemplates/cattle-global-nt:nt-lw7ns" }, "name": "axc-hetzner-node-template", "principalId": "local://user-mt96c", "state": "active", "transitioning": "no", "transitioningMessage": "", "type": "nodeTemplate", "useInternalIpAddress": true, "uuid": "0e658dfa-c179-48eb-bfee-7d2c3add3517"

====

Findings: The manually setup template does NOT store the Ubuntu 20.04 correctly. It does store the SSH-Keys correctly..

In the API you find the two lines for the manually setup template (where I've choosen the 20.04 Version via Pull-Down!) "image": "ubuntu-18.04", "imageId": "15512617",

and in the deployed one: "image": "ubuntu-20.04", "imageId": "15512617", with the same ImageID as for Ubuntu 18.04. As you know, I can not set the value for ImageId via Terraform.

So I believe your driver should check on the String values of the Image or the ImageId and get both Values from Hetzner afterwards and then set this values correctly or check if the set values are matching.

I will try to compile your driver with the Ubuntu 20.04 as a default value an see if I can deploy that to check out what happens then.

**** UPDATE **** I've checked the configuration on Hetzner via curl -H "Authorization: Bearer <API-TOKEN>" https://api.hetzner.cloud/v1/servers > hetzner.txt

"image": { "id": 15512617, "type": "system", "status": "available", "name": "ubuntu-20.04", "description": "Ubuntu 20.04", "image_size": null, "disk_size": 5, "created": "2020-04-23T17:55:14+00:00", "created_from": null, "bound_to": null, "os_flavor": "ubuntu", "os_version": "20.04", "rapid_deploy": true, "protection": { "delete": false },

So the ImageId is obviously correct and the values are set correctly via the manual template. When I do edit the template via GUI, the correct Ubuntu 20.04 - Version is shown too. But not in the API on Rancher2. It seems to be another issue then.

**** Part III - Checked the Images and ImageIds **** via curl -H "Authorization: Bearer <API-Token>" \ https://api.hetzner.cloud/v1/images

Ubuntu 18.04 -> 168855 Ubuntu 20.04 -> 15512617

KittlitzMichael avatar Aug 11 '22 06:08 KittlitzMichael

Hi Michael,

see my comment in the other issue regarding the possibility of setting ImageID from the template.

As for this specific issue: I have re-added the check for the default value, as it was before, in 913904b88. This should, in theory, work with the values you have shown -- provided the mapping actually works the way I understand it.

Regarding the discrepancy of "image": "ubuntu-18.04" and "imageId": "15512617": This may actually be correct due to the older driver versions always setting ubuntu-18.04 for Image, regardless of it being overridden either by name or ID from command line. Therefore, the configuration is actually correct that way in terms of what older driver versions would have produced in their stored config; note, that for all downstream steps, the ID takes precedence if set, so this should not cause any problems, other than looking odd.

Now my changes in that feature branch actually were breaking backwards compatibility: the driver was now expecting an empty string, rather than ubuntu-18.04, when checking whether an image name was provided together with an ID. This much more idiomatic in terms of go (and frankly should have been how its done from the very beginning), but it does cause problems, if someone were to derive flags from an existing config and pass them to a new version of the driver -- which may have been what cause you issues.

I have run an additional manual build for that commit, but only now realized other users may not actually be able to access their artifacts -- sorry if I caused confusion in the comment above. Can you please try again with the latest commit? You can either build it yourself or tell me the architecture you're running terraform on -- the multi-arch build artifact is too big to be attached to a comment, unfortunately.

JonasProgrammer avatar Aug 12 '22 23:08 JonasProgrammer

Hi Jonas,

I did download the source for https://github.com/JonasProgrammer/docker-machine-driver-hetzner/blob/bugfix/86-terraform-image/driver.go build the docker-image with the command goreleaser release --snapshot --skip-publish --rm-dist

and deployed it on our own server. Terraform did Update the driver in Rancher and the Template. The GUI looked good. It did show the correct version of Ubuntu 20.04

If you like, contact me directly - you will find my Profile on XING and LinkedIn), so we can have a session together, with our Rancher2-Instance and test immediately the result of changes.

The Test-Deployement of a Single-Node-Cluster with the Template did not work though:

_apiVersion: management.cattle.io/v3 kind: Node metadata: annotations: cleanup.cattle.io/user-node-remove: "true" field.cattle.io/creatorId: user-mt96c lifecycle.cattle.io/create.node-controller: "true" lifecycle.cattle.io/create.nodepool-provisioner: "true" nodepool.cattle.io/delete-node: "true" creationTimestamp: "2022-08-13T05:47:53Z" finalizers:

  • controller.cattle.io/node-controller generateName: m- generation: 4 labels: cattle.io/creator: norman managedFields:
  • apiVersion: management.cattle.io/v3 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:cleanup.cattle.io/user-node-remove: {} f:field.cattle.io/creatorId: {} f:lifecycle.cattle.io/create.node-controller: {} f:lifecycle.cattle.io/create.nodepool-provisioner: {} f:nodepool.cattle.io/delete-node: {} f:finalizers: .: {} v:"controller.cattle.io/node-controller": {} f:generateName: {} f:labels: .: {} f:cattle.io/creator: {} f:spec: .: {} f:controlPlane: {} f:customConfig: {} f:desiredNodeTaints: {} f:displayName: {} f:etcd: {} f:imported: {} f:internalNodeSpec: {} f:metadataUpdate: .: {} f:annotations: {} f:labels: {} f:nodePoolName: {} f:nodeTemplateName: {} f:requestedHostname: {} f:worker: {} f:status: .: {} f:conditions: {} f:internalNodeStatus: .: {} f:daemonEndpoints: .: {} f:kubeletEndpoint: .: {} f:Port: {} f:nodeInfo: .: {} f:architecture: {} f:bootID: {} f:containerRuntimeVersion: {} f:kernelVersion: {} f:kubeProxyVersion: {} f:kubeletVersion: {} f:machineID: {} f:operatingSystem: {} f:osImage: {} f:systemUUID: {} f:nodeTemplateSpec: .: {} f:cloudCredentialName: {} f:description: {} f:displayName: {} f:driver: {} f:engineInstallURL: {} f:useInternalIpAddress: {} manager: rancher operation: Update time: "2022-08-13T05:48:12Z" name: m-xlqqr namespace: c-m4qqn resourceVersion: "1996883" uid: 48ea2597-6448-405a-97d2-bdf83ec061a6 spec: controlPlane: true customConfig: null desiredNodeTaints: null displayName: "" etcd: true imported: false internalNodeSpec: {} metadataUpdate: annotations: {} labels: {} nodePoolName: c-m4qqn:np-f45kt nodeTemplateName: cattle-global-nt:nt-jsggc requestedHostname: axc-test2-single-node1 worker: true status: conditions:
  • lastUpdateTime: "2022-08-13T05:47:56Z" status: "True" type: Initialized
  • lastUpdateTime: "2022-08-13T05:48:12Z" message: '[cmdCreateInner] error setting machine configuration from flags provided: --hetzner-image and --hetzner-image-id are mutually exclusive' reason: Error status: "False" type: Provisioned
  • lastUpdateTime: "2022-08-13T05:48:12Z" message: Timeout waiting for ssh key reason: Error status: "False" type: Saved internalNodeStatus: daemonEndpoints: kubeletEndpoint: Port: 0 nodeInfo: architecture: "" bootID: "" containerRuntimeVersion: "" kernelVersion: "" kubeProxyVersion: "" kubeletVersion: "" machineID: "" operatingSystem: "" osImage: "" systemUUID: "" nodeTemplateSpec: cloudCredentialName: "" description: Default template to acquire Hetzner Cloud Nodes displayName: hetzner-node-default-template-via-terraform driver: hetzner engineInstallURL: https://releases.rancher.com/install-docker/17.03.2.sh useInternalIpAddress: true_

KittlitzMichael avatar Aug 13 '22 05:08 KittlitzMichael

Hi Michael,

I have added some debug code (hidden behind go build -tags flag_debug) on the branch -- it pretty much dumps the entire driver config as well as flags passed in the error string. If that does not lead anywhere, I'm somewhat out of options.

I don't have Xing, Linked In and the like, but my mail address is public on GitHub (much to the delight of Spammers).

JonasProgrammer avatar Aug 13 '22 07:08 JonasProgrammer

I will release the fix allowing both an empty or the old default image being passed along with an ID tomorrow.

This exemption will old configurations (and command line arguments derived from those) continue to work, as well as allowing for a more idomatic way of passing an empty image name to indicate the ID should be used.

JonasProgrammer avatar Aug 16 '22 20:08 JonasProgrammer