ha-sap-terraform-deployments icon indicating copy to clipboard operation
ha-sap-terraform-deployments copied to clipboard

SSH authentication fails if constraints/compute.requireOsLogin is enforced

Open tstaerk opened this issue 2 years ago • 36 comments

Following your guide, I get when I type terraform apply:

module.hana_node.null_resource.hana_node_provisioner[1]: Still creating... [5m0s elapsed] ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[1], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

tstaerk avatar May 19 '22 05:05 tstaerk

Hi Thorsten, could you provide us a bit more information pls.

a) which guide: github or suse getting started docu b) which CSP (guess gce) c) your settings of the various ssh flags in tfvars:

  • did you provide the values for the public_key/private_key via files, or in tfvars directly
  • what is your setting of "pre_deployment"? If its false, did you provide the "cluster_ssh_xxx" settings? d) which version of the project

petersatsuse avatar May 19 '22 09:05 petersatsuse

@petersatsuse, Thorsten uses the SBP guide I created for GCP.

@tstaerk, I would advise the following:

  1. If you use an older GitHub version, please ensure that you use the most recent one. The current version is 8.1.0. You may execute the git pull command if you are in doubt before creating the environment.
  2. Could you please share the terraform.tfvars file with us. It contains all the configuration files that you used for your environment.

Some other required information:

  1. Used SLES4SAP version: Specify the used SLES4SAP version (SLES12SP4, SLES15SP2, etc.)
  2. Used client machine OS: Specify the used machine OS to execute the project (Windows, any Linux distro, macOS). Even though terraform is multi-platform, some of the local actions are based on Linux distributions so some operations might fail for this reason.
  3. Expected behaviour vs. observed behaviour: Was your deployment failed, or was it completed with an error message?
  4. The provisioning_log_level = "info" option in the terraform.tfvars file is interesting to get more information during the execution of the terraform commands. So it is suggested to run the deployment with this option to see what happens before opening any ticket.
  5. Logs: Upload the deployment logs to make the root cause finding easier. The logs might have sensitive secrets exposed. Remove them before uploading anything here. Otherwise, contact me to send the logs privately to the SUSE teams.

There is the list of the required logs (each of the deployed machines will have all of them):

  • /var/log/salt-os-setup.log
  • /var/log/salt-predeployment.log
  • /var/log/salt-deployment.log
  • /var/log/salt-result.log

ab-mohamed avatar May 19 '22 12:05 ab-mohamed

just realized I did not define a VPC... if there is only one, can't it use this?

tstaerk avatar May 19 '22 14:05 tstaerk

OK, I am using GCP and the following tfvars file:

project = "thorstenstaerk-suse-terraforms" gcp_credentials_file = "sa.json" region = "europe-west1" os_image = "suse-sap-cloud/sles-15-sp2-sap" public_key = "/home/admin_/.ssh/id_rsa.pub" private_key = "/home/admin_/.ssh/id_rsa" cluster_ssh_pub = "salt://sshkeys/cluster.id_rsa.pub" cluster_ssh_key = "salt://sshkeys/cluster.id_rsa" ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/v8/" provisioning_log_level = "info" pre_deployment = true bastion_enabled = false machine_type = "n1-highmem-16" hana_inst_master="thorstenstaerk-sap-media-extracted/" hana_master_password = "SAP_Pass123"

tstaerk avatar May 19 '22 14:05 tstaerk

@tstaerk:

I have just completed a successful deployment using the most recent version, 8.1.0, using the following terraform.tfvars file:

project = "<PROJECT ID>"
gcp_credentials_file = "sa-key.json"
region = "us-west1"
os_image = "suse-sap-cloud/sles-15-sp2-sap"
public_key  = "<PATH TO  THE SSH KEY>/gcp_key.pub"
private_key = "<PATH TO  THE SSH KEY>/gcp_key"
cluster_ssh_pub = "salt://sshkeys/cluster.id_rsa.pub"
cluster_ssh_key = "salt://sshkeys/cluster.id_rsa"
ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:ha-clustering:sap-deployments:v8/"
provisioning_log_level = "info"
pre_deployment = true
bastion_enabled = false
hana_inst_master = "<GCP BUCKET>/HANA/2.0/SPS05/51054623"
hana_master_password = "YourSAPPassword1234"
hana_primary_site = "NUE"
hana_secondary_site = "FRA"

I see that we use almost the same configurations. Please ensure that you use the most recent version, 8.1.0, the master branch? I would suggest using a new clone to ensure no configuration conflicts, or at least execute the command git pull before starting your deployment?

ab-mohamed avatar May 19 '22 15:05 ab-mohamed

git pull tells me "already up to date"

tstaerk avatar May 19 '22 15:05 tstaerk

Can you please try a fresh clone before digging into the issue?

ab-mohamed avatar May 19 '22 15:05 ab-mohamed

deleted and re-checked out

tstaerk avatar May 19 '22 15:05 tstaerk

OK, your and my terraform.tfvars is identical with the exception of passwords, names and your two lines

hana_primary_site = "NUE" hana_secondary_site = "FRA"

tstaerk avatar May 20 '22 15:05 tstaerk

I repeated with my old terraform.tfvars and I get:

module.hana_node.module.hana-load-balancer[0].google_compute_health_check.health-check: Creating... ╷ │ Error: Error creating Network: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/global/networks/demo-network' already exists, alreadyExists │ │ with google_compute_network.ha_network[0], │ on infrastructure.tf line 27, in resource "google_compute_network" "ha_network": │ 27: resource "google_compute_network" "ha_network" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-data-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.data[1], │ on modules/hana_node/main.tf line 12, in resource "google_compute_disk" "data": │ 12: resource "google_compute_disk" "data" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-data-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.data[0], │ on modules/hana_node/main.tf line 12, in resource "google_compute_disk" "data": │ 12: resource "google_compute_disk" "data" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-backup-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.backup[0], │ on modules/hana_node/main.tf line 20, in resource "google_compute_disk" "backup": │ 20: resource "google_compute_disk" "backup" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-backup-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.backup[1], │ on modules/hana_node/main.tf line 20, in resource "google_compute_disk" "backup": │ 20: resource "google_compute_disk" "backup" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/disks/demo-hana-software-1' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.hana-software[1], │ on modules/hana_node/main.tf line 28, in resource "google_compute_disk" "hana-software": │ 28: resource "google_compute_disk" "hana-software" { │ ╵ ╷ │ Error: Error creating Disk: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/disks/demo-hana-software-0' already exists, alreadyExists │ │ with module.hana_node.google_compute_disk.hana-software[0], │ on modules/hana_node/main.tf line 28, in resource "google_compute_disk" "hana-software": │ 28: resource "google_compute_disk" "hana-software" { │ ╵ ╷ │ Error: Error creating HealthCheck: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/global/healthChecks/demo-hana-health-check' already exists, alreadyExists │ │ with module.hana_node.module.hana-load-balancer[0].google_compute_health_check.health-check, │ on modules/load_balancer/main.tf line 5, in resource "google_compute_health_check" "health-check": │ 5: resource "google_compute_health_check" "health-check" {

tstaerk avatar May 20 '22 15:05 tstaerk

after deleting all the stuff above and re-starting terraform apply, I now get:

│ Error: Error creating InstanceGroup: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-b/instanceGroups/demo-hana-primary-group' already exists, alreadyExists │ │ with module.hana_node.google_compute_instance_group.hana-primary-group, │ on modules/hana_node/main.tf line 60, in resource "google_compute_instance_group" "hana-primary-group": │ 60: resource "google_compute_instance_group" "hana-primary-group" { │ ╵ ╷ │ Error: Error creating InstanceGroup: googleapi: Error 409: The resource 'projects/thorstenstaerk-suse-terraforms/zones/europe-west1-c/instanceGroups/demo-hana-secondary-group' already exists, alreadyExists │ │ with module.hana_node.google_compute_instance_group.hana-secondary-group, │ on modules/hana_node/main.tf line 66, in resource "google_compute_instance_group" "hana-secondary-group": │ 66: resource "google_compute_instance_group" "hana-secondary-group" { │ ╵ ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[1], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no │ supported methods remain ╵ ╷ │ Error: file provisioner error │ │ with module.hana_node.null_resource.hana_node_provisioner[0], │ on modules/hana_node/salt_provisioner.tf line 23, in resource "null_resource" "hana_node_provisioner": │ 23: provisioner "file" { │ │ timeout - last error: SSH authentication failed ([email protected]:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no │ supported methods remain

tstaerk avatar May 20 '22 16:05 tstaerk

ssh cannot work, as Cloud Shell does not have network connection to a host inside a GCP project

tstaerk avatar May 20 '22 16:05 tstaerk

130.211.104.240 is demo-vmhana01

tstaerk avatar May 20 '22 16:05 tstaerk

@tstaerk, please execute the terraform destroy command to destroy your environment and before any new attempts to create a new environment using the terraform apply command.

When you ssh to the HANA node using the public IP address, you need to use the used SSH in the terraform.tfvars file. Here is the command format:

ssh -i <SSH PRIVATE KEY> root@<HANA_NODE_PUBLIC_IP_ADDRESS>

ab-mohamed avatar May 20 '22 16:05 ab-mohamed

Hi, I do not call ssh. I get an error that ssh is not possible and I think this is because of the isolation between cloud shell and VMs.

tstaerk avatar May 20 '22 18:05 tstaerk

ok, makes sense - you use the public IP address. Here is what I get:

admin_@cloudshell:~$ ssh -i .ssh/id_rsa [email protected] The authenticity of host '130.211.104.240 (130.211.104.240)' can't be established. ECDSA key fingerprint is SHA256:YgYUATM68uQX/KEEXAqXUm18U+BMR9/1M1iDic7PfVI. Are you sure you want to continue connecting (yes/no/[fingerprint])? Host key verification failed.

tstaerk avatar May 20 '22 19:05 tstaerk

Three possible troubleshooting steps:

  1. Ensure that the public SSH key is attached to the two HANA nodes. If not, attach it manually and try it again.
  2. Ensure that the SSH key pairs have the proper permission: Public key -> 600 Private key -> 400
  3. Try using the -v option with the SSH command to gather more info.

ab-mohamed avatar May 23 '22 06:05 ab-mohamed

Two questions come to mind:

  • what is salt://sshkeys/cluster.id_rsa.pub? Where does it come from? Can I check if mine is right?
  • you said it worked for you, and it uses ssh. So you must have a firewall rule, right?

tstaerk avatar May 26 '22 09:05 tstaerk

ok, makes sense - you use the public IP address. Here is what I get:

admin_@cloudshell:~$ ssh -i .ssh/id_rsa [email protected] The authenticity of host '130.211.104.240 (130.211.104.240)' can't be established. ECDSA key fingerprint is SHA256:YgYUATM68uQX/KEEXAqXUm18U+BMR9/1M1iDic7PfVI. Are you sure you want to continue connecting (yes/no/[fingerprint])? Host key verification failed.

This is perfectly fine that this fails. Just make sure you delete the old host key from you known_hosts. A bit more context: https://linuxhint.com/host-key-verification-failed-mean/

yeoldegrove avatar May 30 '22 06:05 yeoldegrove

Two questions come to mind:

  • what is salt://sshkeys/cluster.id_rsa.pub? Where does it come from? Can I check if mine is right?

This is the clusters's ssh key. Normally you don't have to temper with this.

  • you said it worked for you, and it uses ssh. So you must have a firewall rule, right?

You CAN connect via ssh/port-22 so this will not be a firewall issue.

@tstaerk The ssh keys that are used by terraform to connect via ssh and run salt are these:

public_key = "/home/admin_/.ssh/id_rsa.pub"
private_key = "/home/admin_/.ssh/id_rsa"

Did you create these and are you using these also in your test?

yeoldegrove avatar May 30 '22 06:05 yeoldegrove

@tstaerk In addition to @yeoldegrove notes and questions, you may manually attach the SSH public keys to your nodes as a troubleshooting step.

ab-mohamed avatar May 30 '22 07:05 ab-mohamed

added the authorized_keys file manually to both nodes, now the install looks like it's doing sth!

tstaerk avatar May 31 '22 09:05 tstaerk

install finished, hdbsql answers my SQL queries. Please make sure the authorized_keys get created automatically!

tstaerk avatar May 31 '22 10:05 tstaerk

@tstaerk There is of course already code that handles this https://github.com/SUSE/ha-sap-terraform-deployments/blob/main/gcp/modules/hana_node/main.tf#L155 Are you sure you created the keyfiles and set the correct variables in terraform.tfvars.

yeoldegrove avatar May 31 '22 12:05 yeoldegrove

reproducing it now

tstaerk avatar Jun 03 '22 14:06 tstaerk

@yeoldegrove : looking at https://github.com/SUSE/ha-sap-terraform-deployments/blob/main/gcp/modules/hana_node/main.tf#L155, you only add the ssh key to the instance's metadata, so, ssh passwordless login would only work if the project is set to os_login=false, right? Ever tested it with os_login=true?

tstaerk avatar Jun 06 '22 10:06 tstaerk

@tstaerk I still do not get which exact problem you're having and trying to solve. Could you elaborate on that?

ssh keys are added to the instance's metadata the usual way as you pointed out. Are you using the "Cloud Console"? AFAIK most of the users use their workstations to deploy this. https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata point out that keys added by the Cloud Console will be removed. Maybe this is your issue?

Also, I am not sure what you mean by os_login=true/false. Where would I set this?

yeoldegrove avatar Jun 07 '22 07:06 yeoldegrove

you would go to cloud console, search for "Metadata", select it, and there you set the key os_login and the value false. Then, the ssh key set in https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata will be respected.

tstaerk avatar Jun 08 '22 19:06 tstaerk

@tstaerk are you talking about https://console.cloud.google.com/compute/metadata where I could set e.g. https://cloud.google.com/compute/docs/oslogin/set-up-oslogin ?

Just that I do not miss anything out... Could you please sum-up what exactly is not working for you (your use case) and how you solve it exactly?

Would just setting enable-oslogin=FALSE in https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#metadata fix it for you?

yeoldegrove avatar Jun 10 '22 08:06 yeoldegrove

We found the error, we had an organisation policy (constraints/compute.requireOsLogin) active that enforced every project to have enable-oslogin=true.

This led to the ssh error: image Also key verification was not the problem:

admin_@cloudshell:~/ha-sap-terraform-deployments/gcp (tstaerk-tf-demo)$ ssh -o StrictHostKeyChecking=no [email protected]
Warning: Permanently added '34.79.69.80' (ECDSA) to the list of known hosts.
[email protected]: Permission denied (publickey).

The issue was that the public ssh key was not automatically added to the HANA node's authorized_keys. To change this, we set enable-oslogin=false in the project metadata, see Screenshot:

image

then, ssh'ing worked and the key could be found in authorized_keys:

admin_@cloudshell:~/ha-sap-terraform-deployments/gcp (tstaerk-tf-demo)$ ssh -o StrictHostKeyChecking=no [email protected]
SUSE Linux Enterprise Server 15 SP2 for SAP Applications x86_64 (64-bit)

As "root" (sudo or sudo -i) use the:
  - zypper command for package management
  - yast command for configuration management

Management and Config: https://www.suse.com/suse-in-the-cloud-basics
Documentation: https://www.suse.com/documentation/sles-15/
Community: https://community.suse.com/

Have a lot of fun...
demo-hana02:~ # cat .ssh/authorized_keys
# Added by Google
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfWjWgE1NkXnmv0UgAkm+zHnJ2UJgTVpMEAlc3Fo+tH6U1BsPL++ceiE+mAAjcT41j7Ew5N4qyranPSTQOvrLSGvCITP4edAJlbrh4JOzy5/aNP/EfWZiprtytrkdBEzd0gbhg+Bh98FlEUoxLtZSFsP2090zI7hTuT9DEB3eQknMkR9g+JsgGcDd0t4kdERaLZp+spkPCJF3LQ2h+9ZbmHqwBjzYLsJLRMma3y+aU80IHONBOEaX+ab+1vR1CuxMBwRjSlDkfRVBuxMWnj+ipQaLjiMLFaGbANFxPFj4AaeDnYO/jnKUaIRQOEAvpgjN9r5hVsRT0I+cpBvTpqcrx admin_@cs-485070161371-default-boost-wds4w

So, one solution would be to manually copy the public ssh key into the OS' authorized_keys file. Another option could be to check if constraints/compute.requireOsLogin is enforced and if yes, tell the user that they have to manually copy the ssh key to all nodes.

tstaerk avatar Jun 13 '22 09:06 tstaerk