runtime
runtime copied to clipboard
TLS: Support for acorn-dns provided domains
Problem Statement:
We want to give users an easy way to get TLS endpoints for the acorn apps. There are three different use cases where we need to secure endpoints with TLS:
- Acorn generated on-acorn.io domains
- Domains specified by the user at install time using the
--cluster-domain .mydomain.comsyntax (which replaces and overrides on-acorn.io) - Domains specified by the user at run Time using the
--publish hostname:service:portsyntax
This issue represents JUST the first use case.
Constraints and assumptions
- We don’t want to install, manage, and upgrade cert-manger in the user's cluster. So, we want to use libraries to do our own direct ACME based integration.
- There is some "prior art" for doing similar things in rio. You can check that code out for ideas/implementation details, here: https://github.com/rancher/rio/tree/e9d490246e7252a04eb872a99f542c5da2f135cf/modules/letsencrypt/pkg. I (Craig) didn't write that code and don't know how useful it will be, but it should at least be reviewed for ideas.
Background on acorn-dns
To better understand this feature, here's some background on how acorn-dns works.
Right now, when you install acorn, an on-acorn.io subdomain is provisioned for your cluster. It will be like: <random slug>.on-acorn.io. This is done as a handler, watching the acorn configMap that contains the installer's config choices, here: https://github.com/acorn-io/acorn/blob/main/pkg/controller/dns/config.go#L36
When you then create an application, acorn makes a request to our acorn-dns service to create a DNS entry for the containers in the application. The domain will look like: container-name.app-name.<random slug>.on-acorn.io. All this DNS logic is driven by the ingress that acorn creates for the app. That logic is here: https://github.com/acorn-io/acorn/blob/main/pkg/controller/dns/ingress.go
Note: if your cluster type is a "local" one like rancher desktop, docker desktop, minikube, while we will reserve a domain for you (which doesnt actually create any real dns entries), when you launch an application, we won't actually create a dns entry for you. We have a wildcard *.local.on-acorn.io entry that always resolves to 127.0.0.1 and we'll use that for you local app.
Proposed solution
We DONT want to create a cert per acorn application. We want ONE wildcard cert per cluster. We want this cert to be provisioned at the same time/place (roughly) that the <random slug>.on-acorn.io domain is reserved. Again, here.
Assuming the cert is provisioned, we should create TLS endpoints by default for acorn applications using that cert.
One big implication of the wildcard cert: the wildcard can only go one level deep. So if the cert is for *.xyz.on-acorn.io, this is valid: foo.xyz.on-acorn.io but this isnt: foo.bar.xyz.on-acorn.io. Our current app FQDN scheme - generated here - is multiple levels deep. S0, we need to refactor it to be a single level deep and separated by hyphens. There are probably so edge cases where we could create conflicts that we need to investgate/discuss.
Don't forget that we will need to handle renewing the certificate.
Only comment is about container.app.slug.on-acorn.io not fitting wildcards like you mention it at the end.
We'll need to have have strategies to disambiguate edge cases:
- Container or apps can contain hyphen and conflict with a different one.. (e.g.
container="foo", app="bar-baz"vscontainer="foo-bar", app="baz") - Could add constraints to limit container+app to 31 chars each, so combining them can't ever be too long. But might be too restrictive.
- If no hard constraint, a label can be too long, so must be shortened to fit into 63 chars somehow
- The new shortened label might now conflict with an existing entry and need to be changed again somehow.
- It's documented already but pushing people more towards using
defaultas the name of the service you want exposed so the name produced is just the shorterapp.slug.on-acorn.io. Or do that automatically if there's only one container, or have a field to set the default instead of requiring it to be calleddefault.
Validated! 🥳