caddy icon indicating copy to clipboard operation
caddy copied to clipboard

`default_sni` + `fallback_sni` global settings are flimsy

Open polarathene opened this issue 8 months ago • 3 comments

These two global settings don't seem to play well once wildcard cert preference is involved, neither worked when querying Caddy without SNI (expected default_sni) or with an invalid one (expected fallback_sni).

$ step certificate inspect --insecure --servername invalid-value https://172.18.0.2 | grep DNS
failed to connect: remote error: tls: internal error

$ step certificate inspect --insecure https://172.18.0.2 | grep DNS
failed to connect: remote error: tls: internal error

This will occur when a wildcard certificate is available that the configured SNI setting would match. Instead for it to actually work, the SNI setting would need to be set to *.example.internal (instead of say default.example.internal).

NOTE: Unrelated to wildcard cert preference. These two settings fail in the same manner (both on Caddy 2.9 + 2.10, probably earlier versions too) when tls internal (or similar) is not set somewhere (does not need to be related to the configured SNI values), or for Caddy 2.9, setting auto_https prefer_wildcard also triggers similar logic to restore SNI default/fallback functionality.

Additionally:

  • It would be nice for {env.ENV_NAME} support?
  • Caddy L4 doesn't seem to be compatible (presumably these settings are only relevant to the HTTP app?)
  • docs: default_sni should perhaps also mention the expectation of certificate to be provisioned (like fallback_sni)?

Reference

  • I created this bug report based off my earlier observations regarding SNI that I reported here: https://github.com/mholt/caddy-l4/issues/276#issuecomment-2817496708
  • Caddy 2.10.0 now defaults to preferring wildcard certificates. This replaces the experimental global setting auto_https prefer_wildcard from the Caddy 2.9.x series.

This may only be relevant to locally managed certs but I noticed how default_sni and fallback_sni global settings are affected by wildcard cert preference, along with a requirement for tls directive (at least with internal or externally loading a cert). I've not explored if this affects typical deployments with ACME provisioned certs, or if other variants of the tls directive likewise provide a workaround.

I have verified this issue persists across both versions of Caddy (and perhaps earlier versions I've not tested). Below is a reproduction compose.yaml for use with Docker Compose. Otherwise you just need Caddy with a Caddyfile and Smallstep step CLI for verifying the SNI functionality for these two global settings.

I've inlined commentary in this config below related to my findings (apologies for lacking time to better format for this bug report). The reproduction commands gives an idea of how to verify, but you'll need to go over my Caddyfile config notes below for caveats of what works/doesn't and other related observations that should help pinpoint the cause. (EDIT: Briefly highlighted caveats in report above)

Reproduction

services:
  reverse-proxy:
    container_name: caddy
    image: caddy:2.10.0
    #image: caddy:2.9.1
    environment:
      APEX_DOMAIN: example.internal
    # Configure containers on the same network to resolve `bug.example.internal` to the Caddy container IP:
    networks:
      default:
        aliases:
          - bug.example.internal
    configs:
      - source: caddy-config
        target: /etc/caddy/Caddyfile
      - source: snippet-global-sni
        target: /srv/globals/sni
      # Optional for related Caddy L4 compatibility issue (requires custom Caddy built with L4 module):
      - source: snippet-global-l4
        target: /srv/globals/l4
      # Toggle this to add/remove the wildcard snippet:
      - source: snippet-wildcard
        target: /srv/sites/wildcard

  debug:
    scale: 0 # Prevent this container starting with `docker compose up`
    image: localhost/debug
    build:
      dockerfile_inline: |
        FROM alpine
        RUN apk add curl step-cli

configs:
  caddy-config:
    content: |
      # Global Settings:
      {
        # Not entirely compatible with global SNI settings (due to need for manual workaround via a `tls` directive):
        local_certs

        # Caddy 2.9.x only (Default functionality in Caddy 2.10.0):
        #auto_https prefer_wildcard

        import /srv/globals/*
      }

      # WORKAROUND NOTE:
      # - `tls internal` must be used for `default_sni` / `fallback_sni` to work?
      #   `tls <cert> <key>` also works, thus may be something else related to this directive?
      #   However the directive does not need to be inside a site-block related to the SNI values.
      #   Nor is the directive required for Caddy 2.9.x when setting `auto_https prefer_wildcard`.
      # - Caddy 2.10.0 roughly enables equivalent `auto_https prefer_wildcard` functionality by default,
      #   but when a wildcard cert is provisioned/loaded, the global SNI settings are only compatible
      #   with wildcard SNI value and additionally requires the `tls` directive workaround
      #   (unlike Caddy 2.9.x with `auto_https prefer_wildcard`).
      # - Without either config workaround, `step certificate inspect` will fail with error:
      #   failed to connect: remote error: tls: internal error
      hello-world.localhost {
        tls internal
        abort
      }

      # This does not interfere despite provisioning a wildcard cert
      # The wildcard cert caveat presumably only affects SNI settings from working when
      # there is a valid wildcard cert match that gets preferred but doesn't match the SNI value?
      *.localhost {
        abort
      }

      # default_sni: step certificate inspect --insecure https://172.18.0.2 | grep DNS
      # fallback_sni: step certificate inspect --insecure --servername invalid-value https://172.18.0.2 | grep DNS
      bug.{env.APEX_DOMAIN}, default.{env.APEX_DOMAIN}, fallback.{env.APEX_DOMAIN} {
        respond <<HEREDOC
          Hello from subdomain: {labels.2}

          HEREDOC
      }

      # Adds the wildcard site snippet when provided to the container:
      import /srv/sites/*

  snippet-wildcard:
    content: |
      *.{env.APEX_DOMAIN} {
        respond "wildcard cert"
      }

  snippet-global-sni:
    content: |
        # Placeholders are incompatible, must be an exact match to SAN?
        #default_sni hello.{env.APEX_DOMAIN}
        #fallback_sni bye.{env.APEX_DOMAIN}

        # Returns wildcard cert (provided one is provisioned/loaded):
        # Valid for Caddy 2.9.1 (even when `auto_https prefer_wildcard`)
        # Valid for Caddy 2.10.0 (requires `tls` directive workaround + wildcard cert provisioned/loaded)
        #default_sni *.example.internal
        #fallback_sni *.example.internal

        # This requires the `tls internal` directive workaround
        # Alternative workaround (Caddy 2.9.x) via `auto_https prefer_wildcard` (but only _without_ a wildcard cert available)
        # Similar to the alternative workaround, for Caddy 2.10.0 when a wildcard cert is available this SNI config will fail.
        default_sni default.example.internal
        fallback_sni fallback.example.internal

  # Related - These always return `failed to connect: EOF` - Global SNI not supported by Caddy L4?:
  # default_sni: step certificate inspect --insecure tcp://172.18.0.2:4443 | grep DNS
  # fallback_sni: step certificate inspect --insecure --servername invalid-value tcp://172.18.0.2:4443 | grep DNS
  # Direct queries are valid (they will return their direct certificate or when preferring wildcard, the wildcard cert):
  # step certificate inspect --insecure --servername bug.example.internal tcp://172.18.0.2:4443 | grep DNS
  snippet-global-l4:
    content: |
      layer4 {
        :4443 {
          @host-any tls sni_regexp .*\.example\.internal
          route @host-any {
            proxy caddy:443
          }
        }
      }

To run the example:

# Start Caddy container:
docker compose up -d --force-recreate
# Start the debug container and shell into it to run `step` commands:
docker compose run --rm -it debug ash

# Depending on success/failure, you'll get output from the below commands similar to:
# Success: `DNS:default.example.internal`
# Fail: `failed to connect: remote error: tls: internal error`
# Tip: For getting the IP you could use something like `ping bug.example.internal` within the `debug` container

# default_sni:
step certificate inspect --insecure https://172.18.0.2 | grep DNS
# fallback_sni:
step certificate inspect --insecure --servername invalid-value https://172.18.0.2 | grep DNS

# Optional: Caddy L4 (presently not compatible at all with the global SNI settings)
# default_sni:
step certificate inspect --insecure tcp://172.18.0.2:4443 | grep DNS
# fallback_sni:
step certificate inspect --insecure --servername invalid-value tcp://172.18.0.2:4443 | grep DNS

# Optional: You can verify a connection with curl by setting SNI this way:
curl --insecure --resolve bug.example.internal:443:172.18.0.2 https://bug.example.internal
# Caddy L4 equivalent (proxies port 4443 to 443 when matched successfully):
curl --insecure --resolve bug.example.internal:4443:172.18.0.2 https://bug.example.internal:4443

polarathene avatar Apr 22 '25 00:04 polarathene

So if I understand, essentially, this issue is:

Instead for it to actually work, the SNI setting would need to be set to *.example.internal (instead of say default.example.internal).

and so you are requesting that the options of default_sni/fallback_sni support wildcard matches. (I don't know if it currently does, so, that's a fair feature request.)

It would be nice for {env.ENV_NAME} support?

Probably trivial, no problem.

Caddy L4 doesn't seem to be compatible (presumably these settings are only relevant to the HTTP app?)

As discussed in the linked issue in that repo, it's specifically for the tls app because it provides the connection policies used by the other apps like http.

docs: default_sni should perhaps also mention the expectation of certificate to be provisioned (like fallback_sni)?

Not a bad idea, I guess I thought that was obvious.

mholt avatar Apr 22 '25 17:04 mholt

so you are requesting that the options of default_sni/fallback_sni support wildcard matches.

👍

  • If default_sni *.example.internal then it will already match a wildcard certificate and return that.
  • If default_sni default.example.internal but a wildcard cert is available and preferred, this feature fails and you get an error, no certificate returned.

So yes expectation would be for a valid certificate to be matched when available. Due to Caddy 2.9 auto_https prefer_wildcard or equivalent default behaviour with Caddy 2.10, as soon as a wildcard cert is made available the functionality of default_sni / fallback_sni breaks as there is no direct cert match, only the wildcard that doesn't match the requested SNI.


docs: default_sni should perhaps also mention the expectation of certificate to be provisioned (like fallback_sni)?

Not a bad idea, I guess I thought that was obvious.

For http app perhaps, but with Caddy L4, it was not clear to me the tls sni ... matcher required a valid cert unless you use the handler to terminate TLS. In the reproduction it forwards that back to the http app site-block which should handle that (and presumably use default_sni / fallback_sni?)

When I used the curl method that seems to work, but that is always providing an explicit SNI, there doesn't appear to be a way to do a request without SNI or an invalid one 🤔

I found step certificate inspect to be useful for testing this. I couldn't verify against Caddy L4 though since default_sni / fallback_sni of the tls app doesn't appear compatible there? (otherwise the dig DNS client would have worked with no SNI provided)

I was a bit more surprised that a wildcard SNI was needed to match a wildcard cert, or that it was a valid value 😅 At that point I think it clicked that it was matching the SAN / CommonName of a cert Caddy had loaded.

polarathene avatar Apr 22 '25 22:04 polarathene

It might also be good to have clarity about the tls directive workaround requirement? (either document workaround required or fix if a bug)

I have not tried with the default ACME provisioning, so perhaps local_certs global option affected that. I know that auto_https prefer_wildcard in Caddy 2.9 also worked as a workaround (provided there was no matching wildcard cert).

I assume the feature should work without experiencing this caveat:

# Global Settings:
{
  local_certs

  default_sni default.example.internal
  fallback_sni fallback.example.internal
}

# WORKAROUND NOTE:
# - `tls internal` must be used for `default_sni` / `fallback_sni` to work?
#   `tls <cert> <key>` also works, thus may be something else related to this directive?
# - Might be due to using `local_certs`? I have not tested with publicly provisioned certs
# - NOTE: The directive just needs to be used somewhere,
#   it does not need to be associated to the configured SNI values
hello-world.localhost {
  tls internal
  abort
}

# Use the Smallstep `step` CLI command to verify:
# default_sni: step certificate inspect --insecure https://172.18.0.2 | grep DNS
# fallback_sni: step certificate inspect --insecure --servername invalid-value https://172.18.0.2 | grep DNS
bug.example.internal, default.example.internal, fallback.example.internal {
  respond <<HEREDOC
    Hello from subdomain: {labels.2}

    HEREDOC
}

Wildcard matching Caddyfile

For completeness, I'll provide two Caddyfiles to reference. These are specifically for Caddy 2.10.0.

Expectation (*.example.internal cert is returned when no SNI or invalid SNI is provided):

{
  local_certs

  default_sni default.example.internal
  fallback_sni fallback.example.internal
}

# Caddy 2.10 breaks `default_sni` / `fallback_sni` when this site-block exists:
*.example.internal {
  tls internal
  abort
}

bug.example.internal, default.example.internal, fallback.example.internal {
  respond <<HEREDOC
    Hello from subdomain: {labels.2}

    HEREDOC
}

If tls cert.pem key.pem was used to load an explicit default.example.internal / fallback.example.internal certificate, the above would work as those will be selected despite wildcard preference (which is a bit misleading perhaps, in that the config may infer preferring wildcard cert over explicit SAN, but the preference is towards avoiding provisioning a separate SAN cert when an existing wildcard cert match is available). I know you know this, but for other readers it might not be something they're familiar with (I know I've probably mixed that up when referring to the functionality before).

As such this presently does work:

{
  local_certs

  # Wildcard provisioned cert will be used:
  # (Even clients can request a wildcard SNI value like this without these settings)
  default_sni *.example.internal
  fallback_sni *.example.internal
}

*.example.internal {
  tls internal
  abort
}

bug.example.internal, default.example.internal, fallback.example.internal {
  respond <<HEREDOC
    Hello from subdomain: {labels.2}

    HEREDOC
}

polarathene avatar Apr 22 '25 22:04 polarathene