tectonic-installer
tectonic-installer copied to clipboard
terraform bare metal installation fails repeatedly during certs creation
What keywords did you search in tectonic-installer issues before filing this one?
terraform
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
-
Tectonic version (release or commit hash): https://releases.tectonic.com/tectonic-1.6.2-tectonic.1.tar.gz
-
Terraform version (
terraform version): 0.9.4 0.9.6 -
Platform (aws|azure|openstack|metal): metal
What happened?
I was following the bare metal installation readme: https://coreos.com/tectonic/docs/latest/install/bare-metal/metal-terraform.html
I got to a point where I could configure matchbox using terraform and perform simple installations using iPXE (just install CoreOS, add my ssh key)
I then moved on to configuring terraform as described in the "Customize the deployment" section in the doc linked above. Once I figured out all the required configuration (the doc is out of date), I've run terraform apply -var-file.... and it failed at some certification generation steps.
Error applying plan:
9 error(s) occurred:
* module.bootkube.local_file.kubelet-crt: Resource 'tls_locally_signed_cert.kubelet' not found for variable 'tls_locally_signed_cert.kubelet.cert_pem'
* module.bootkube.data.template_file.kubeconfig: Resource 'tls_locally_signed_cert.kubelet' not found for variable 'tls_locally_signed_cert.kubelet.cert_pem'
* module.tectonic.tls_cert_request.identity-server: unexpected EOF
* module.tectonic.tls_cert_request.ingress: 1 error(s) occurred:
* tls_cert_request.ingress: unexpected EOF
* module.bootkube.tls_locally_signed_cert.apiserver: Resource 'tls_cert_request.apiserver' not found for variable 'tls_cert_request.apiserver.cert_request_pem'
* module.bootkube.tls_cert_request.apiserver: connection is shut down
* module.bootkube.tls_self_signed_cert.kube-ca: connection is shut down
* module.bootkube.tls_locally_signed_cert.kubelet: connection is shut down
* module.tectonic.tls_locally_signed_cert.identity-client: connection is shut down
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
I've tried the process multiple times and each time it fails slightly differently, but always with something related to certs
What you expected to happen?
I would expect terraform apply to succeed
How to reproduce it (as minimally and precisely as possible)?
follow the bare metal installation instructions with terraform (non-graphical) and try to apply config to terraform
Anything else we need to know?
@mwasilew2 Thanks for your issue report. Do you mind posting your (possibly anonymized) terraform.tfvars so we can triage the issue?
# cat build/my-cluster/terraform.tfvars | grep -v "^//" | grep -v "^$"
tectonic_admin_email = "xxxxxxxx"
tectonic_admin_password_hash = "xxxxxxxx"
tectonic_base_domain = "xxxxxxxxx"
tectonic_cl_channel = "stable"
tectonic_cluster_cidr = "10.2.0.0/16"
tectonic_cluster_name = "my-cluster"
tectonic_etcd_count = "0"
tectonic_experimental = false
tectonic_kube_apiserver_service_ip = "10.3.0.1"
tectonic_kube_dns_service_ip = "10.3.0.10"
tectonic_kube_etcd_service_ip = "10.3.0.15"
tectonic_license_path = "/root/tectonic_license"
tectonic_master_count = "1"
tectonic_metal_cl_version = "1353.7.0"
tectonic_metal_controller_domain = "controller1"
tectonic_metal_controller_domains = ["controller1"]
tectonic_metal_controller_macs = ["xxxxxxxxxxxxxxxx"]
tectonic_metal_controller_names = ["controller1"]
tectonic_metal_ingress_domain = ""
tectonic_metal_matchbox_ca =
<<EOD
-----BEGIN CERTIFICATE-----
xxxxxxxxxxxxxx
-----END CERTIFICATE-----
EOD
tectonic_metal_matchbox_client_cert =
<<EOD
-----BEGIN CERTIFICATE-----
xxxxxxxxxxxxx
-----END CERTIFICATE-----
EOD
tectonic_metal_matchbox_client_key =
<<EOD
-----BEGIN RSA PRIVATE KEY-----
xxxxxxxxxxxxxxx
-----END RSA PRIVATE KEY-----
EOD
tectonic_metal_matchbox_http_url = "xxxxxxxx"
tectonic_metal_matchbox_rpc_endpoint = "xxxxxxxx"
tectonic_metal_worker_domains = ["worker1", "worker2"]
tectonic_metal_worker_macs = ["52:54:00:79:e7:23", "52:54:00:79:e7:22"]
tectonic_metal_worker_names = ["worker1", "worker2"]
tectonic_pull_secret_path = "/root/tectonic_pull_secret.json"
tectonic_service_cidr = "10.3.0.0/16"
tectonic_ssh_authorized_key = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIC0rTJB60Mb3/SL9qw591FDQnfhxXnaQtZBSH3U6urGZ mw5@lenovo-sl300"
tectonic_vanilla_k8s = false
tectonic_worker_count = "2"
thanks a lot for the tfvars file. just one thing pops up to me: Are /root/tectonic_pull_secret.json and /root/tectonic_license available on the machine you are installing from (not the target machines) and readable by the user that is installing?
yes, both files were on the machine from which I was running the tectonic installer and both were accessible by the user.
It worked fine with the graphical installer. Unless you want any more diags the issue can be closed as far as I'm concerned. It was probably some misconfiguration on my side. Thanks for your help!
Just for the record I'm pasting the tfvars generated by the gui installer
{"tectonic_admin_email":"xxxxxxx",
"tectonic_admin_password_hash":"xxxxxxxxxxxxxxxxx",
"tectonic_base_domain":"unused",
"tectonic_cl_channel":"stable",
"tectonic_cluster_cidr":"10.2.0.0/16",
"tectonic_cluster_name":"my-cluster",
"tectonic_dns_name":"",
"tectonic_experimental":false,
"tectonic_kube_apiserver_service_ip":"10.3.0.1",
"tectonic_kube_dns_service_ip":"10.3.0.10",
"tectonic_kube_etcd_service_ip":"10.3.0.15",
"tectonic_license_path":"./license.txt",
"tectonic_metal_cl_version":"1353.7.0",
"tectonic_metal_controller_domain":"controller1",
"tectonic_metal_controller_domains":["controller1"],
"tectonic_metal_controller_macs":["xxxxxxxx"],
"tectonic_metal_controller_names":["controller1"],
"tectonic_metal_ingress_domain":"worker1",
"tectonic_metal_matchbox_ca":"-----BEGIN CERTIFICATE-----\nMIIFDTw+0kMSW/tfuSm\nSg==\n-----END CERTIFICATE-----\n",
"tectonic_metal_matchbox_client_cert":"-----BEGIN CERTIFICATE-----\nMIIEHWU5gJdeNd1tFIjqout/Spw=\n-----END CERTIFICATE-----\n",
"tectonic_metal_matchbox_client_key":"-----BEGIN RSA PRIVATE KEY-----\nMIGzeAoPm7XdgzTMXQH\n-----END RSA PRIVATE KEY-----\n",
"tectonic_metal_matchbox_http_url":"http://matchbox:8080",
"tectonic_metal_matchbox_rpc_endpoint":"matchbox:8081",
"tectonic_metal_worker_domains":["worker1"],
"tectonic_metal_worker_macs":["52:54:00:79:e7:23"],
"tectonic_metal_worker_names":["worker1"],
"tectonic_pull_secret_path":"./pull_secret.json",
"tectonic_service_cidr":"10.3.0.0/16",
"tectonic_ssh_authorized_key":"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXrykpoLEnW6bCRN3PXpA+usUvlGQvKC9ArrO+J0/AySn8glBGmn7CwvrIEfYowxpLExpRqQ+uqQ+RULL8lkaCRP97nB+D1JiqlucuC5/yrerfeJuw4Fh4q/Mc1YXw3bEFyGHuOASTrXriI34OVIzFeKBBdMBKnydAQB4FgnOGBB1d6amqV7A8gq67bK8F6SlRXS/eN2Kufk1WS0LaIox1wF1HMcWM/JTm6wGAve8AAhfCcs7xtzU7dUwQ4gmORuJ7ln/RzUrkSwScg7c/ODLKKR5eZskbtzWh9zZeq+VpeCPhKbDWbuIM7EAknZUhwaesgnrRpsYqUeRhZiLBMMYt root@dell-primary\n"}
Hi,
Sorry for the delay here. Most of us were busy at CoreOS Fest.
The EOF and connection is shut down look extremely suspicious to me - sounds like Terraform crashed. Would you mind running TF_LOG=TRACE terraform apply?