terraform-azurerm-terraform-enterprise icon indicating copy to clipboard operation
terraform-azurerm-terraform-enterprise copied to clipboard

Handling TLS Cert chain?

Open ryancbutler opened this issue 5 years ago • 16 comments

Expected Behavior

I should be able to include an intermediate certificate while building a cluster allowing a full tls\ssl chain to be present. Currently using Letsencrypt for a wildcard certificate.

Actual Behavior

Only the host certificate is present in the chain missing any intermediates.

Steps to Reproduce

Export a PFX including all intermediate and\or root certificates.
image Once exported and TFE cluster is deployed openssl s_client -connect or ssllabs host checker will come back with chain issues causing issues with VCS webhooks. image image

ryancbutler avatar Nov 18 '19 16:11 ryancbutler

Hey @ryancbutler, suggest you use the template available here to better flesh out the explanation in this fault.

ausfestivus avatar Nov 18 '19 22:11 ausfestivus

Thanks! Hope this works!

ryancbutler avatar Nov 18 '19 22:11 ryancbutler

I attempted to update the cert from https://servername:8800/console/settings which updates the replicated console cert but doesn't look like it does anything for the TFE app. Is there seriously no way to update a certificate?

ryancbutler avatar Nov 21 '19 14:11 ryancbutler

Hi Ryan! Apologies for missing this! Updating the certificate there doesn't work the way it did in the non HA version as now the TLS connection is terminated at the load balancer. You'll need to update the file used in your terraform, and apply that so that it gives the load balancer the new certificate.

We keep that setting there as some folks may want to do their certificate setup differently, but I think we could benefit from a note in the HA version of the application as it is quite confusing!

bnferguson avatar Dec 05 '19 20:12 bnferguson

Ok, updating the cert makes sense. Thanks! What about the chain issue @bnferguson ?

ryancbutler avatar Dec 05 '19 22:12 ryancbutler

Sure! I believe you need to export the entire cert chain in order for the cert before converting it to PFX as outlined here https://docs.microsoft.com/en-us/azure/app-service/configure-ssl-certificate#upload-a-private-certificate. Not the whole thing, but the merging and exporting bits as prep.

bnferguson avatar Dec 09 '19 09:12 bnferguson

Yep, verified the PFX contains the entire chain before running (see above). Something along the way only grabs only the main cert and not the entire chain.

ryancbutler avatar Dec 09 '19 14:12 ryancbutler

Hmm, interesting, when you see this is it when connecting to the server or when the server is reaching out (say when doing a run or setting up VCS connections). With the latter you may need to add a CA bundle to the Replicated console, though I'd expect the LetsEncrypt cert to work without it.

bnferguson avatar Dec 09 '19 14:12 bnferguson

I first noticed it when attempting to configure the VCS since Github didn't like the missing cert. So after checking the actual server (443 and 8800) noticed it was missing. At one point I tried adding to the CA just for troubleshooting without any luck.

ryancbutler avatar Dec 09 '19 14:12 ryancbutler

@ryancbutler Been working out some other Azure issues for the last week. Planning on getting to this this week to try to reproduce this.

bnferguson avatar Dec 16 '19 13:12 bnferguson

Sorry for the delay here, had some other pressing bugs and then the holidays hit. Jumped on this first thing and have reproduced the issue. It's quite baffling! The pfx has the full certchain, but when it gets served it seems like it's been stripped down to just the final cert without the intermediate.

I'm also working on the 0.12 conversion so as I do that I'll be looking at how we put together some of the cert things to see what might be causing this. Also am asking around as I've not seen a cert do this before (well, I have, but it was more like someone forgot to add the line that included the intermediate certs. When they're bundled together like they are in a PFX or even some of my experiments with a full chain in the cert I would expect that not to be possible).

Anyway, just wanted to give an update that this is still something I'm looking at!

bnferguson avatar Jan 08 '20 11:01 bnferguson

Oh and on the Webhooks side of things, there is a work around of disabling SSL Verification on the GitHub side (https://github.com/[owner]/[repository]/settings/hooks, then look for the hook to the TFE install). It's sub-optimal but it gets things working.

I had no issues with the OAuth/org setup in my reproduction.

bnferguson avatar Jan 08 '20 11:01 bnferguson

Have tracked this issue down to how Azure's waagent decodes PFX files from the Key vault. Apparently it's a known thing that it only places the root and the leaf skipping any intermediates. But this only happens when you add the certificate as a certificate to the Vault as opposed to say, as a secret.

If we go this route, we'll probably change the interface to take a cert and a key like we do with other cloud providers (instead of PFX) and rely less on Azure's method of placing certs on servers.

We're reworking the interfaces of the modules to be much easier to work with along with the 0.12 upgrade, and I think we may fix this issue with that since it'd be changing the interface.

bnferguson avatar Jan 08 '20 14:01 bnferguson

Not sure if I am facing the same or similar issue. Looking for some next steps advice. We want to deploy internally, but I was willing to try public IP usage in an attempt to test the deployment. Further we use an internal CA which has an intermediate. Doesn't look like I can specify the Root CA. Also do I need to do a wildcard cert? I couldn't find documentation on this other than this thread.

Basically the install seems to be stalled. I think the healthprobes from the LB are failing cause of the SSL handshake and the installer doesn't continue.

from:tail -f apiserver {"log":"I0121 18:10:03.498299 1 log.go:172] http: TLS handshake error from 168.63.129.16:58994: EOF\n","stream":"stderr","time":"2020-01-21T18:10:03.498488441Z"}

from:systemctl status kubelet Jan 21 18:12:01 tfe-iofnjq38-primary-0 kubelet[9944]: E0121 18:12:01.720811 9944 kubelet.go:2248] node "tfe-iofnjq38-primary-0" not found

Is there a terraform module for individual deployment in azure?

pearcec avatar Jan 21 '20 18:01 pearcec

I did find https://www.terraform.io/docs/enterprise/before-installing/index.html#tls-certificate-and-private-key about using a wildcard -- I also now see information about the CA bundle and private CAs. I will take a look at this.

pearcec avatar Jan 21 '20 18:01 pearcec

I deployed with the private bundle and it picked that up. I could see the log in /tmp/ptfe-customer-certs/. I am confused as to why the load balancer still isn't working. Shouldn't the certificate on primary for port 6443 respond with my certificate? I still get the error messages https://github.com/hashicorp/terraform-azurerm-terraform-enterprise/issues/51#issuecomment-576809705

root@tfe-810jzzx7-primary-0:/var/log# wget https://10.134.34.6:6443/
--2020-01-22 14:36:20--  https://10.134.34.6:6443/
Connecting to 10.134.34.6:6443... connected.
ERROR: cannot verify 10.134.34.6's certificate, issued by ‘CN=kubernetes’:
  Unable to locally verify the issuer's authority.
To connect to 10.134.34.6 insecurely, use `--no-check-certificate'.
root@tfe-810jzzx7-primary-0:/var/log# openssl s_client -connect 10.134.34.6:6443
CONNECTED(00000003)
depth=0 CN = kube-apiserver
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = kube-apiserver
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
 0 s:/CN=kube-apiserver
   i:/CN=kubernetes
---
Server certificate
-----BEGIN CERTIFICATE-----

pearcec avatar Jan 22 '20 14:01 pearcec