[ENH] - Investigate cert-manager
Feature description
We can look into adding cert-manager: https://cert-manager.io
Value and/or benefit
It'll allow users to get certs from various providers (besides, let's encrypt).
Anything else?
(Originally proposed by @costrouc)
We need to figure out issuers we need to support (other than Let's Encrypt listed as ACEM). Any votes?
The implementation can be broken into several parts, one per Issuer, starting with Let's Encrypt. To resolve this ticket, we should have the following:
- Figure out list of issuers we need to support to begin with besides.
- A written road map of how we plan to incorporate Issuers. This should involve documenting any Nebari configuration/CLI changes if needed.
- Individual tickets for the first few Issuers we intend to support. We should list all high-level changes and dependencies.
- POC with Traefik (as ingress) + cert-manager (as certificate manager) + Let's Encrypt (as issuer) + http01 (as solver).
I tried deploying nebari with the following config:
...
...
certificate:
type: existing
# acme_email: [email protected]
# acme_server: https://acme-v02.api.letsencrypt.org/directory
secret_name: tls-whoami-ingress-http
Installed the cert-manager and its CRDs using the following command:
helm install cert-manager \
--namespace dev \
--version v1.14.4 jetstack/cert-manager \
--set installCRDs=true
Installed issuer:
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: default
spec:
acme:
email: [email protected]
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: quansight-issuer-account-key
solvers:
- http01:
ingress:
ingressClassName: traefik
selector:
dnsNames:
- 'whoami.at.quansight.com'
EOF
Created a certificate:
kubectl apply -f - <<EOF
kind: Certificate
apiVersion: cert-manager.io/v1
metadata:
name: nlb-lab-tls-cert
namespace: dev
spec:
commonName: at.quansight.dev
dnsNames:
- at.quansight.dev
duration: 2160h0m0s
issuerRef:
name: default
kind: ClusterIssuer
renewBefore: 360h0m0s
secretName: tls-whoami-ingress-http
EOF
I can see that challenge is generated but stuck on
Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200'
Deploying nebari with following config for self-signed:
diff --git a/src/_nebari/stages/kubernetes_ingress/template/modules/kubernetes/ingress/main.tf b/src/_nebari/stages/kubernetes_ingress/template/modules/kubernetes/ingress/main.tf
index 08bb5b2..641d040 100644
--- a/src/_nebari/stages/kubernetes_ingress/template/modules/kubernetes/ingress/main.tf
+++ b/src/_nebari/stages/kubernetes_ingress/template/modules/kubernetes/ingress/main.tf
@@ -2,6 +2,7 @@ locals {
default_cert = [
"--entrypoints.websecure.http.tls.certResolver=default",
"--entrypoints.minio.http.tls.certResolver=default",
+ "--certificatesresolvers.default.acme.httpchallenge=true",
]
certificate-settings = {
lets-encrypt = [
From the ingress I can see that we have the follwoning challenge url:
at.quansight.dev/.well-known/acme-challenge/WP3go40E6EB-RdyD_UKJpskvT5v5mHcDkqZmkiQ2Akk but this is not accessable from internet.
This needs to be routed to the solver pod.
Adding an ingress route:
kubectl apply -f - <<EOF
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: whoami
namespace: dev
spec:
entryPoints:
- /.well-known
routes:
- kind: Rule
match: Host(`whoami.at.quansight.dev`)
priority: 10
services:
- kind: Service
name: cm-acme-http-solver-9lnm7
namespace: dev
passHostHeader: true
port: 80
responseForwarding:
flushInterval: 1ms
scheme: https
weight: 10
tls:
secretName: tls-whoami-ingress-http
options:
name: opt
namespace: dev
certResolver: default
domains:
- main: at.quansight.dev
EOF
I found this in traefik docs
This explains why I could not convince traefik.IngressRoute to auto create certificate.
I will need to create an temprary endpoint using kubernetes Ingress for initial certificate creation.
cc: @dcmcand
Steps to get the cert-manager working:
- Start nebari with the following config, note
secret_name: tls-ingress-http.
$ tail nebari-config.yaml
kubernetes_version: '1.29'
region: eu-west-1
certificate:
type: existing
secret_name: tls-ingress-http
dns:
provider: cloudflare
auto_provision: true
- Install lets-encrypt
helm install cert-manager --namespace dev --version v1.14.5 jetstack/cert-manager --set installCRDs=true
- Apply the following config.
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: cert-manager-cluster-issuer
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: quansight-issuer-account-key
solvers:
- http01:
ingress:
class: traefik
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: nlb-lab-tls-cert
namespace: dev
spec:
commonName: pt.quansight.dev
secretName: tls-ingress-http
issuerRef:
name: cert-manager-cluster-issuer
kind: ClusterIssuer
dnsNames:
- pt.quansight.dev
EOF
Certificate configuration in nebari for lets_encrypt.
certificate:
acme_email: [email protected]
type: lets-encrypt
For self-signed certificates:
certificate:
type: existing
secret_name: tls-ingress-http
Post upgrade to cert-manager. We have a few options:
- Maintain the same contract in Nebari config but internally use cert-manager So, the following will use lets-encrypt via the cert-manager.
certificate:
acme_email: [email protected]
type: lets-encrypt
The following configuration will behave exactly as it does today.
certificate:
type: existing
secret_name: tls-ingress-http
- Explicit another type of certificate.
certificate:
type: cert-manager:
issuer: lets-encrypt
acme_email: [email protected]
"Maintain the same contract in Nebari config"
This makes more sense to me, users don't need to know what underlying tool is used to generate certificates. Ideally this upgrade would have no visible change for the user.
I'm hopeful that cert manager will help with https://github.com/nebari-dev/nebari/issues/2478 for self signed certificate. https://github.com/nebari-dev/nebari/pull/2479 fixes that issue for custom certs (https://github.com/nebari-dev/nebari/pull/2479) but not self signed certs since I haven't figured out how to grab traefik's default self signed cert easily.
Interesting links: Adding multiple solvers - https://cert-manager.io/docs/configuration/acme/#adding-multiple-solver-types Metrostar cert-manager-plugin - https://github.com/MetroStar/nebari-cert-manager/blob/main/src/nebari_plugin_cert_manager_chart/template/chart/templates/certificates.yaml
Just adding here that when we try to use Jupyter-scheduler on the NATO deployment (2024.6.1) it fails because of the self-signed cert:
HTTPSConnectionPool(host='nebari-static.cmreclimatechange.nsf', port=443): Max retries exceeded with url: /argo/api/v1/workflows/dev (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)')))
Is there any hope for self-signed certs working in the future?
Issue remains open #2499 is closed, but can be referenced for future work