tailscale icon indicating copy to clipboard operation
tailscale copied to clipboard

FR: Cache tailscale cert and key data to kube state secret

Open dcarrion87 opened this issue 1 year ago • 2 comments

What are you trying to do?

  • When obtaining a cert via socket integration (E.g. Caddy) certs are stored in the state dir. Based on the default run.sh this is /tmp.
  • When the pod is recycled the state dir is emptied and cert is requested again.
  • If this happens too often it hits the LetsEncrypt limit for that name.

How should we solve this?

  • It would be great if the certificate and key data was cached in the kube state/secret similar to the node key so it can write the latest cert data here and check and use this before it goes off and requests again.

What is the impact of not solving this?

  • Having to change the machine name.
  • Creating a custom caching mechanism outside of tailscaled.

Anything else?

No response

dcarrion87 avatar Sep 17 '22 12:09 dcarrion87

Where's your node key secret being written to? Which Tailscale state storage are you using?

bradfitz avatar Sep 17 '22 14:09 bradfitz

Where's your node key secret being written to? Which Tailscale state storage are you using?

        name  = "TS_KUBE_SECRET"
        value = "tailscale-state"

@bradfitz the default one being passed through for tailscale container on start via env TS_KUBE_SECRET=tailscale-state.

tailscaled starts with --state=kube:tailscale-state when used with the default run.sh

Upon checking the kubernetes secret that it's writing to:

data:
  _daemon: REDACTED
  _machinekey: REDACTED
  _nl-node-key: REDACTED

I wonder if certs could be cached here too?

dcarrion87 avatar Sep 17 '22 15:09 dcarrion87

@bradfitz this is the main issue when not cached.

If for whatever reason something goes wrong with the workload and it causes succession of requests without checking for cache this happens:

022/10/11 23:26:42 cert("REDACTED.REDACTED.ts.net"): getCertPEM: 429 urn:ietf:params:acme:error:rateLimited: Error creating new order :: too many certificates (5) already issued for this exact set of domains in the last 168 hours: REDACTED.REDACTED.ts.net, retry after 2022-10-12T17:32:02Z: see https://letsencrypt.org/docs/duplicate-certificate-limit/

Which means we need to force a hostname change to get it going again. And also running into this:

2022/10/12 01:09:34 cert("REDACTED.REDACTED.ts.net"): getCertPEM: acme.Register: 429 urn:ietf:params:acme:error:rateLimited: Error creating new account :: too many registrations for this IP: see https://letsencrypt.org/docs/too-many-registrations-for-this-ip/

Any ideas on managing this with Tailscale whilst this is support would be much appreciated.

dcarrion87 avatar Oct 11 '22 23:10 dcarrion87

In case this helps anyone else running into this on a k8s cluster.

Using the bitnami/kubectl image...

Create a sidecar that watches for cert changes and write them to a secret.

# Get base64 of md5sums of cert files or whatever is returned
oldsum=$(md5sum ${cert_path}/* 2>&1 | base64)
while true; do
  sleep 10
  # Get the new md5sums
  newsum=$(md5sum ${cert_path}/* 2>&1 | base64)
  # If they're the same continue
  [[ "$oldsum" == "$newsum" ]] && continue
  # Otherwise set old md5sum to this and update the secret
  oldsum=$newsum
  kubectl create secret generic tailscale-cert \
    --save-config \
    --dry-run=client \
    --from-file=${cert_path}/acme-account.key.pem \
    --from-file=${cert_path}/${domain}.key \
    --from-file=${cert_path}/${domain}.crt \
    -o yaml | kubectl apply -f - || :
done

Init container which loops through the files in the secret and writes them to tailscale cert path:

mkdir -p ${cert_path}
# Get the file data as pipe delimited key value
lines=$(kubectl get secret tailscale-cert -o jsonpath="{.data}" | jq -r 'keys[] as $k | "\($k)|\(.[$k])"')
# Loop through the lines and write the data to their original paths
for l in $lines; do
  data=$${l#*|}
  file=$${l%%|*}
  echo $data | base64 -d > ${cert_path}/$file || :
done 

dcarrion87 avatar Oct 13 '22 22:10 dcarrion87

Similar issue: tsnet needs to store certificates in memory: https://github.com/tailscale/tailscale/issues/4597

DentonGentry avatar Oct 31 '22 01:10 DentonGentry