ha-sap-terraform-deployments
ha-sap-terraform-deployments copied to clipboard
SSH authentication fails if constraints/compute.requireOsLogin is enforced
Following your guide, I get when I type terraform apply:
module.hana_node.null_resource.hana_node_provisioner[1]: Still creating... [5m0s elapsed] ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[1], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Hi Thorsten, could you provide us a bit more information pls.
a) which guide: github or suse getting started docu b) which CSP (guess gce) c) your settings of the various ssh flags in tfvars:
- did you provide the values for the public_key/private_key via files, or in tfvars directly
- what is your setting of "pre_deployment"? If its false, did you provide the "cluster_ssh_xxx" settings? d) which version of the project
@petersatsuse, Thorsten uses the SBP guide I created for GCP.
@tstaerk, I would advise the following:
- If you use an older GitHub version, please ensure that you use the most recent one. The current version is 8.1.0. You may execute the
git pull
command if you are in doubt before creating the environment. - Could you please share the
terraform.tfvars
file with us. It contains all the configuration files that you used for your environment.
Some other required information:
- Used SLES4SAP version: Specify the used SLES4SAP version (SLES12SP4, SLES15SP2, etc.)
- Used client machine OS: Specify the used machine OS to execute the project (Windows, any Linux distro, macOS). Even though terraform is multi-platform, some of the local actions are based on Linux distributions so some operations might fail for this reason.
- Expected behaviour vs. observed behaviour: Was your deployment failed, or was it completed with an error message?
- The
provisioning_log_level = "info"
option in theterraform.tfvars
file is interesting to get more information during the execution of the terraform commands. So it is suggested to run the deployment with this option to see what happens before opening any ticket. - Logs: Upload the deployment logs to make the root cause finding easier. The logs might have sensitive secrets exposed. Remove them before uploading anything here. Otherwise, contact me to send the logs privately to the SUSE teams.
There is the list of the required logs (each of the deployed machines will have all of them):
- /var/log/salt-os-setup.log
- /var/log/salt-predeployment.log
- /var/log/salt-deployment.log
- /var/log/salt-result.log
just realized I did not define a VPC... if there is only one, can't it use this?
OK, I am using GCP and the following tfvars file:
project = "thorstenstaerk-suse-terraforms" gcp_credentials_file = "sa.json" region = "europe-west1" os_image = "suse-sap-cloud/sles-15-sp2-sap" public_key = "/home/admin_/.ssh/id_rsa.pub" private_key = "/home/admin_/.ssh/id_rsa" cluster_ssh_pub = "salt://sshkeys/cluster.id_rsa.pub" cluster_ssh_key = "salt://sshkeys/cluster.id_rsa" ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v8/" provisioning_log_level = "info" pre_deployment = true bastion_enabled = false machine_type = "n1-highmem-16" hana_inst_master="thorstenstaerk-sap-media-extracted/" hana_master_password = "SAP_Pass123"
@tstaerk:
I have just completed a successful deployment using the most recent version, 8.1.0, using the following terraform.tfvars
file:
project = "<PROJECT ID>"
gcp_credentials_file = "sa-key.json"
region = "us-west1"
os_image = "suse-sap-cloud/sles-15-sp2-sap"
public_key = "<PATH TO THE SSH KEY>/gcp_key.pub"
private_key = "<PATH TO THE SSH KEY>/gcp_key"
cluster_ssh_pub = "salt://sshkeys/cluster.id_rsa.pub"
cluster_ssh_key = "salt://sshkeys/cluster.id_rsa"
ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:v8/"
provisioning_log_level = "info"
pre_deployment = true
bastion_enabled = false
hana_inst_master = "<GCP BUCKET>/HANA/2.0/SPS05/51054623"
hana_master_password = "YourSAPPassword1234"
hana_primary_site = "NUE"
hana_secondary_site = "FRA"
I see that we use almost the same configurations. Please ensure that you use the most recent version, 8.1.0, the master branch?
I would suggest using a new clone to ensure no configuration conflicts, or at least execute the command git pull
before starting your deployment?
git pull tells me "already up to date"
Can you please try a fresh clone before digging into the issue?
deleted and re-checked out
OK, your and my terraform.tfvars is identical with the exception of passwords, names and your two lines
hana_primary_site = "NUE" hana_secondary_site = "FRA"
I repeated with my old terraform.tfvars and I get:
module.hana_node.module.hana-load-balancer[0].google_compute_health_check.health-check: Creating... ╷ │ Error: Error creating Network: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/global/networks/demo-network' already exists, alreadyExists │ │ with google_compute_network.ha_network[0], │ on infrastructure.tf line 27, in resource "google_compute_network" "ha_network": │ 27: resource "google_compute_network" "ha_network" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-data-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.data[1], │ on modules/hana_node/main.tf line 12, in resource "google_compute_disk" "data": │ 12: resource "google_compute_disk" "data" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-data-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.data[0], │ on modules/hana_node/main.tf line 12, in resource "google_compute_disk" "data": │ 12: resource "google_compute_disk" "data" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-backup-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.backup[0], │ on modules/hana_node/main.tf line 20, in resource "google_compute_disk" "backup": │ 20: resource "google_compute_disk" "backup" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-backup-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.backup[1], │ on modules/hana_node/main.tf line 20, in resource "google_compute_disk" "backup": │ 20: resource "google_compute_disk" "backup" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-software-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.hana-software[1], │ on modules/hana_node/main.tf line 28, in resource "google_compute_disk" "hana-software": │ 28: resource "google_compute_disk" "hana-software" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-software-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.hana-software[0], │ on modules/hana_node/main.tf line 28, in resource "google_compute_disk" "hana-software": │ 28: resource "google_compute_disk" "hana-software" { │ ╵ ╷ │ Error: Error creating HealthCheck: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/global/healthChecks/demo-hana-health-check' already exists, alreadyExists │ │ with module.hana_node.module.hana-load-balancer[0].google_compute_health_check.health-check, │ on modules/load_balancer/main.tf line 5, in resource "google_compute_health_check" "health-check": │ 5: resource "google_compute_health_check" "health-check" {
after deleting all the stuff above and re-starting terraform apply, I now get:
│ Error: Error creating InstanceGroup: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/instanceGroups/demo-hana-primary-group' already exists, alreadyExists │ │ with module.hana_node.google_compute_instance_group.hana-primary-group, │ on modules/hana_node/main.tf line 60, in resource "google_compute_instance_group" "hana-primary-group": │ 60: resource "google_compute_instance_group" "hana-primary-group" { │ ╵ ╷ │ Error: Error creating InstanceGroup: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/instanceGroups/demo-hana-secondary-group' already exists, alreadyExists │ │ with module.hana_node.google_compute_instance_group.hana-secondary-group, │ on modules/hana_node/main.tf line 66, in resource "google_compute_instance_group" "hana-secondary-group": │ 66: resource "google_compute_instance_group" "hana-secondary-group" { │ ╵ ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[1], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no │ supported methods remain ╵ ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[0], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no │ supported methods remain
ssh cannot work, as Cloud Shell does not have network connection to a host inside a GCP project
130.211.104.240 is demo-vmhana01
@tstaerk, please execute the terraform destroy
command to destroy your environment and before any new attempts to create a new environment using the terraform apply
command.
When you ssh to the HANA node using the public IP address, you need to use the used SSH in the terraform.tfvars
file. Here is the command format:
ssh -i <SSH PRIVATE KEY> root@<HANA_NODE_PUBLIC_IP_ADDRESS>
Hi, I do not call ssh. I get an error that ssh is not possible and I think this is because of the isolation between cloud shell and VMs.
ok, makes sense - you use the public IP address. Here is what I get:
admin_@cloudshell:~$ ssh -i .ssh/id_rsa [email protected] The authenticity of host '130.211.104.240 (130.211.104.240)' can't be established. ECDSA key fingerprint is SHA256:YgYUATM68uQX/KEEXAqXUm18U+BMR9/1M1iDic7PfVI. Are you sure you want to continue connecting (yes/no/[fingerprint])? Host key verification failed.
Three possible troubleshooting steps:
- Ensure that the public SSH key is attached to the two HANA nodes. If not, attach it manually and try it again.
- Ensure that the SSH key pairs have the proper permission: Public key -> 600 Private key -> 400
- Try using the
-v
option with the SSH command to gather more info.
Two questions come to mind:
- what is salt://sshkeys/cluster.id_rsa.pub? Where does it come from? Can I check if mine is right?
- you said it worked for you, and it uses ssh. So you must have a firewall rule, right?
ok, makes sense - you use the public IP address. Here is what I get:
admin_@cloudshell:~$ ssh -i .ssh/id_rsa [email protected] The authenticity of host '130.211.104.240 (130.211.104.240)' can't be established. ECDSA key fingerprint is SHA256:YgYUATM68uQX/KEEXAqXUm18U+BMR9/1M1iDic7PfVI. Are you sure you want to continue connecting (yes/no/[fingerprint])? Host key verification failed.
This is perfectly fine that this fails. Just make sure you delete the old host key from you known_hosts. A bit more context: https://linuxhint.com/host-key-verification-failed-mean/
Two questions come to mind:
- what is salt://sshkeys/cluster.id_rsa.pub? Where does it come from? Can I check if mine is right?
This is the clusters's ssh key. Normally you don't have to temper with this.
- you said it worked for you, and it uses ssh. So you must have a firewall rule, right?
You CAN connect via ssh/port-22 so this will not be a firewall issue.
@tstaerk The ssh keys that are used by terraform to connect via ssh and run salt are these:
public_key = "/home/admin_/.ssh/id_rsa.pub"
private_key = "/home/admin_/.ssh/id_rsa"
Did you create these and are you using these also in your test?
@tstaerk In addition to @yeoldegrove notes and questions, you may manually attach the SSH public keys to your nodes as a troubleshooting step.
added the authorized_keys file manually to both nodes, now the install looks like it's doing sth!
install finished, hdbsql answers my SQL queries. Please make sure the authorized_keys get created automatically!
@tstaerk There is of course already code that handles this https://github.com/SUSE/ha-sap-terraform-deployments/blob/main/gcp/modules/hana_node/main.tf#L155
Are you sure you created the keyfiles and set the correct variables in terraform.tfvars
.
reproducing it now
@yeoldegrove : looking at https://github.com/SUSE/ha-sap-terraform-deployments/blob/main/gcp/modules/hana_node/main.tf#L155, you only add the ssh key to the instance's metadata, so, ssh passwordless login would only work if the project is set to os_login=false, right? Ever tested it with os_login=true?
@tstaerk I still do not get which exact problem you're having and trying to solve. Could you elaborate on that?
ssh keys are added to the instance's metadata the usual way as you pointed out. Are you using the "Cloud Console"? AFAIK most of the users use their workstations to deploy this. https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata point out that keys added by the Cloud Console will be removed. Maybe this is your issue?
Also, I am not sure what you mean by os_login=true/false
. Where would I set this?
you would go to cloud console, search for "Metadata", select it, and there you set the key os_login and the value false. Then, the ssh key set in https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata will be respected.
@tstaerk are you talking about https://console.cloud.google.com/compute/metadata where I could set e.g. https://cloud.google.com/compute/docs/oslogin/set-up-oslogin ?
Just that I do not miss anything out... Could you please sum-up what exactly is not working for you (your use case) and how you solve it exactly?
Would just setting enable-oslogin=FALSE
in https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata fix it for you?
We found the error, we had an organisation policy (constraints/compute.requireOsLogin) active that enforced every project to have enable-oslogin=true.
This led to the ssh error:
Also key verification was not the problem:
admin_@cloudshell:~/ha-sap-terraform-deployments/gcp (tstaerk-tf-demo)$ ssh -o StrictHostKeyChecking=no [email protected]
Warning: Permanently added '34.79.69.80' (ECDSA) to the list of known hosts.
[email protected]: Permission denied (publickey).
The issue was that the public ssh key was not automatically added to the HANA node's authorized_keys. To change this, we set enable-oslogin=false in the project metadata, see Screenshot:
then, ssh'ing worked and the key could be found in authorized_keys:
admin_@cloudshell:~/ha-sap-terraform-deployments/gcp (tstaerk-tf-demo)$ ssh -o StrictHostKeyChecking=no [email protected]
SUSE Linux Enterprise Server 15 SP2 for SAP Applications x86_64 (64-bit)
As "root" (sudo or sudo -i) use the:
- zypper command for package management
- yast command for configuration management
Management and Config: https://www.suse.com/suse-in-the-cloud-basics
Documentation: https://www.suse.com/documentation/sles-15/
Community: https://community.suse.com/
Have a lot of fun...
demo-hana02:~ # cat .ssh/authorized_keys
# Added by Google
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfWjWgE1NkXnmv0UgAkm+zHnJ2UJgTVpMEAlc3Fo+tH6U1BsPL++ceiE+mAAjcT41j7Ew5N4qyranPSTQOvrLSGvCITP4edAJlbrh4JOzy5/aNP/EfWZiprtytrkdBEzd0gbhg+Bh98FlEUoxLtZSFsP2090zI7hTuT9DEB3eQknMkR9g+JsgGcDd0t4kdERaLZp+spkPCJF3LQ2h+9ZbmHqwBjzYLsJLRMma3y+aU80IHONBOEaX+ab+1vR1CuxMBwRjSlDkfRVBuxMWnj+ipQaLjiMLFaGbANFxPFj4AaeDnYO/jnKUaIRQOEAvpgjN9r5hVsRT0I+cpBvTpqcrx admin_@cs-485070161371-default-boost-wds4w
So, one solution would be to manually copy the public ssh key into the OS' authorized_keys file. Another option could be to check if constraints/compute.requireOsLogin is enforced and if yes, tell the user that they have to manually copy the ssh key to all nodes.