runtime icon indicating copy to clipboard operation
runtime copied to clipboard

TLS: Support for acorn-dns provided domains

Open cjellick opened this issue 3 years ago • 1 comments

Problem Statement:

We want to give users an easy way to get TLS endpoints for the acorn apps. There are three different use cases where we need to secure endpoints with TLS:

  1. Acorn generated on-acorn.io domains
  2. Domains specified by the user at install time using the --cluster-domain .mydomain.com syntax (which replaces and overrides on-acorn.io)
  3. Domains specified by the user at run Time using the --publish hostname:service:port syntax

This issue represents JUST the first use case.

Constraints and assumptions

  • We don’t want to install, manage, and upgrade cert-manger in the user's cluster. So, we want to use libraries to do our own direct ACME based integration.
  • There is some "prior art" for doing similar things in rio. You can check that code out for ideas/implementation details, here: https://github.com/rancher/rio/tree/e9d490246e7252a04eb872a99f542c5da2f135cf/modules/letsencrypt/pkg. I (Craig) didn't write that code and don't know how useful it will be, but it should at least be reviewed for ideas.

Background on acorn-dns

To better understand this feature, here's some background on how acorn-dns works.

Right now, when you install acorn, an on-acorn.io subdomain is provisioned for your cluster. It will be like: <random slug>.on-acorn.io. This is done as a handler, watching the acorn configMap that contains the installer's config choices, here: https://github.com/acorn-io/acorn/blob/main/pkg/controller/dns/config.go#L36

When you then create an application, acorn makes a request to our acorn-dns service to create a DNS entry for the containers in the application. The domain will look like: container-name.app-name.<random slug>.on-acorn.io. All this DNS logic is driven by the ingress that acorn creates for the app. That logic is here: https://github.com/acorn-io/acorn/blob/main/pkg/controller/dns/ingress.go

Note: if your cluster type is a "local" one like rancher desktop, docker desktop, minikube, while we will reserve a domain for you (which doesnt actually create any real dns entries), when you launch an application, we won't actually create a dns entry for you. We have a wildcard *.local.on-acorn.io entry that always resolves to 127.0.0.1 and we'll use that for you local app.

Proposed solution

We DONT want to create a cert per acorn application. We want ONE wildcard cert per cluster. We want this cert to be provisioned at the same time/place (roughly) that the <random slug>.on-acorn.io domain is reserved. Again, here.

Assuming the cert is provisioned, we should create TLS endpoints by default for acorn applications using that cert.

One big implication of the wildcard cert: the wildcard can only go one level deep. So if the cert is for *.xyz.on-acorn.io, this is valid: foo.xyz.on-acorn.io but this isnt: foo.bar.xyz.on-acorn.io. Our current app FQDN scheme - generated here - is multiple levels deep. S0, we need to refactor it to be a single level deep and separated by hyphens. There are probably so edge cases where we could create conflicts that we need to investgate/discuss.

Don't forget that we will need to handle renewing the certificate.

cjellick avatar Jun 20 '22 14:06 cjellick

Only comment is about container.app.slug.on-acorn.io not fitting wildcards like you mention it at the end.

We'll need to have have strategies to disambiguate edge cases:

  • Container or apps can contain hyphen and conflict with a different one.. (e.g. container="foo", app="bar-baz" vs container="foo-bar", app="baz")
  • Could add constraints to limit container+app to 31 chars each, so combining them can't ever be too long. But might be too restrictive.
  • If no hard constraint, a label can be too long, so must be shortened to fit into 63 chars somehow
  • The new shortened label might now conflict with an existing entry and need to be changed again somehow.
  • It's documented already but pushing people more towards using default as the name of the service you want exposed so the name produced is just the shorter app.slug.on-acorn.io. Or do that automatically if there's only one container, or have a field to set the default instead of requiring it to be called default.

vincent99 avatar Sep 08 '22 00:09 vincent99

Validated! 🥳

cjellick avatar Oct 04 '22 15:10 cjellick