caddy icon indicating copy to clipboard operation
caddy copied to clipboard

Empty reply from `/metrics` on secure admin listener

Open sushain97 opened this issue 8 months ago • 5 comments

I'm running caddy:2.9.1-alpine with a JSON config. Accessing the /metrics endpoint over the secure admin listener returns an empty response. I have enabled metrics. This used to work but I suspect it broke with an upgrade and I didn't notice until now :(

It works over the plaintext listener:

/srv $ http_proxy= wget -O- http://127.0.0.1:2019/metrics | head
Connecting to 127.0.0.1:2019 (127.0.0.1:2019)
writing to stdout
# HELP caddy_admin_http_requests_total Counter of requests made to the Admin API's HTTP endpoints.
# TYPE caddy_admin_http_requests_total counter
caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/metrics"} 2
caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/reverse_proxy/upstreams"} 273
# HELP caddy_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload.
# TYPE caddy_config_last_reload_success_timestamp_seconds gauge
caddy_config_last_reload_success_timestamp_seconds 1.7432255318608303e+09
# HELP caddy_config_last_reload_successful Whether the last configuration reload attempt was successful.
# TYPE caddy_config_last_reload_successful gauge
caddy_config_last_reload_successful 1

Other requests work fine over the secure listener:

sushain@vesuvianite ~ ❯❯❯ curl -fsS --cert /tmp/cert --key /tmp/key --resolve web.local.skc.name:8001:127.0.0.1 https://web.local.skc.name:8001/reverse_proxy/upstreams | jq length
27

But, this doesn't:

sushain@vesuvianite ~ ❯❯❯ curl -v -fsS --cert /tmp/cert --key /tmp/key --resolve web.local.skc.name:8001:127.0.0.1 https://web.local.skc.name:8001/metrics
* Added web.local.skc.name:8001:127.0.0.1 to DNS cache
* Hostname web.local.skc.name was found in DNS cache
*   Trying 127.0.0.1:8001...
* Connected to web.local.skc.name (127.0.0.1) port 8001
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, CERT verify (15):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / X25519 / id-ecPublicKey
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: CN=web.local.skc.name
*  start date: Mar 25 21:37:40 2025 GMT
*  expire date: Jun 23 21:37:39 2025 GMT
*  subjectAltName: host "web.local.skc.name" matched cert's "web.local.skc.name"
*  issuer: C=US; O=Let's Encrypt; CN=E5
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA384
*   Certificate level 1: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
* using HTTP/1.x
> GET /metrics HTTP/1.1
> Host: web.local.skc.name:8001
> User-Agent: curl/8.5.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS alert, close notify (256):
* Empty reply from server
* Closing connection
* TLSv1.3 (OUT), TLS alert, close notify (256):
curl: (52) Empty reply from server

The requests are going over a kubectl port forward:

sushain@vesuvianite ~ ❯❯❯ kubectl port-forward -n web service/caddy-admin 8001:443
Forwarding from 127.0.0.1:8001 -> 2021
Forwarding from [::1]:8001 -> 2021
Handling connection for 8001
Handling connection for 8001

My config is long, but straightforward:

{
    "apps": {
        "http": {
            "servers": {
                "srv0": {
                    "listen": [
                        ":443"
                    ],
                    "routes": [ <SNIP> ],
                    "named_routes": {
                        "authentik-reverse-proxy": {
                            "handle": [
                                {
                                    "handler": "reverse_proxy",
                                    "upstreams": [
                                        {
                                            "dial": "authentik-server.idp.svc.cluster.local:80"
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                },
                "srv1": {
                    "listen": [
                        ":8843"
                    ],
                    "routes": [ <SNIP> ]
                }
            },
            "metrics": {
                "per_host": true
            }
        },
        "tls": {
            <SNIP>
        }
    },
    "admin": {
        "identity": {
            "identifiers": [
                "web.local.skc.name"
            ]
        },
        "remote": {
            "access_control": [
                {
                    "public_keys": [
                        "MIIFazCCA1OgAwIBAgIUPgOv0jkoNYNZLTI556de68UST0QwDQYJKoZIhvcNAQELBQAwRTELMAkGA1UEBhMCQVUxEzARBgNVBAgMClNvbWUtU3RhdGUxITAfBgNVBAoMGEludGVybmV0IFdpZGdpdHMgUHR5IEx0ZDAeFw0yMzA5MDQwMjE4NTJaFw00MzA4MzAwMjE4NTJaMEUxCzAJBgNVBAYTAkFVMRMwEQYDVQQIDApTb21lLVN0YXRlMSEwHwYDVQQKDBhJbnRlcm5ldCBXaWRnaXRzIFB0eSBMdGQwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQCeqTNpVKxOKs+5Dg5mk4moaTSGNC9wJL3uWn+GHvX1rHDJrB+CRILmE5LG1rWu3j6R42INYbo25Ngdhf3mFh3uGrNBecw9G+HwVfL5FXDWE06mgyn/N/eGyNQshwZeyANkO9SlBf9YrpnMb8DTowNev4Jq9/C2kmQAKlRSqzj8iOODmJf4ToFtFiCK9vwNKQwkN5FZ368yelKTsFqZFzOhU/0FwiZdgFmRj2Rf6mkfLl0c08BzwNUatr+BFHGtrJKraIJVnwW9i9ZhE5hayJhhBBuzrodYez/eFkUT2y9z+SNSELxQREwwzI47/xes/geWGD+YKRxQrdKN9A6NTnxpJ3pVA9BJlkrhG1Z62D78R9akrKTzMceuk7oP9x2GLe+hfwhc5ghSaamj4Kv8eHIWbx0fdSj7jOxmEc+9//zn7EXP6p/l8DlBaLMgPKUerDVSEpg1X0cy4Fktf2sVI2+U+RVA2ju/bti59QX5wS47kPh6Uj0Bc1SXTQ3LLRj7alLx+C0cRk2BCyQ0nSm6zEWVZl+/Jl8hlcX64rWydW4uXY7drre61CfC6LmHJfl6x/UYKslEGc+vRRAJz6CQNX9fRRA9oSnvP7NuYWzMLygtS5DpkbbRwDI5nwiqVyJh2oSlymbgaDPT4y33/CdihbySf3DrDJiuod1n0LT31lt1BQIDAQABo1MwUTAdBgNVHQ4EFgQUTziMqC5C95vDbjrw4OJCKQUQkj4wHwYDVR0jBBgwFoAUTziMqC5C95vDbjrw4OJCKQUQkj4wDwYDVR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAgEAGmZu2qLK1MrGzVtn+rh3ZmxpxOqGxn+M00Mk5eGpSJ1Xg8UlvLUcUr969ORqbNkXIRvRnx3nUjO552IurrGr95Q1V/0r2KjvH1hmJd/CACnXPXoqxfAZd9n0/KhuAtnEiMQQF3Dap7At1S4uffwge/E6q6kUXR59n9kT9NUo0nL44ORv3slSWIKCoPGKNXjOmTBRJz2fJ5zknACjVC+3LX4mf/mMpFmLc8aWNs+jbJeltBqIbmTYAV3cLgv1G14bwC++vrQoQLeYCsyUkohpLRidcgv/A4Cuh4/ft1xBJa1wYAUZFr5RtyTeLOE9jWdZ9nUWoWDvw539tfJc2DJ2qJKgHOh/3JFa/JOJfSP1SuP+UhDWWZV6Oa9rHbCBE2nD3SobAzNgV5aw72ecZnrVUu9i+AGZ5fSyPdT5BluFWsCpcWxvv7qFqp0xrGY+p+TU+GK449gwrOjUjCvVk3UMxIGF/SY/npaE+5hPvuwnOFOKC886gXpj5LNxwWE7TtlyNKLIzQQ5im1veznhLlQlRwDrOmWOC1BThKXRyPt5rUn7r4Z8Rw48diZgDJ44VRD94q1Jcnu5kjJPuA2gtFj2oaf9VL/9sIDwdGfWsPeQUtzmtsw4S5W200RD85EC1SafvVz2XxummsP/B01pyTdR2rdp5X4K0ym4qV6lYUxPmw0="
                    ],
                    "permissions": [
                        {
                            "paths": [
                                "/"
                            ],
                            "methods": [
                                "GET"
                            ]
                        }
                    ]
                }
            ]
        }
    }
}

sushain97 avatar Mar 29 '25 06:03 sushain97

The admin endpoint doesn't take requests from the public secure port, i.e. 443. It only takes requests from the remote admin endpoint. You have to configure listen. This is a config issue.

mohammed90 avatar Mar 31 '25 08:03 mohammed90

@mohammed90 I believe I am accessing the remote admin endpoint, not the secure port.

Note that the kubectl port forward is actually forwarding to port 2001 inside the container. Moreover, the /reverse_proxy/upstreams endpoint works perfectly on the same address.

sushain97 avatar Mar 31 '25 15:03 sushain97

But the remote admin endpoints isn't in your configuration. If there's something missing, please share full details without redaction.

mohammed90 avatar Apr 01 '25 19:04 mohammed90

@mohammed90, I've been relying on the documented default value of :2021.

That being said, I went ahead and configured it explicitly:

"admin": {
    "identity": {
        "identifiers": [
            "web.local.skc.name"
        ]
    },
    "remote": {
        "listen": ":2021",
        "access_control": [
            {
                "public_keys": [
                    "MIIFazCCA1OgAwIBAgIUPgOv0jkoNYNZLTI556de68UST0QwDQYJKoZIhvcNAQELBQAwRTELMAkGA1UEBhMCQVUxEzARBgNVBAgMClNvbWUtU3RhdGUxITAfBgNVBAoMGEludGVybmV0IFdpZGdpdHMgUHR5IEx0ZDAeFw0yMzA5MDQwMjE4NTJaFw00MzA4MzAwMjE4NTJaMEUxCzAJBgNVBAYTAkFVMRMwEQYDVQQIDApTb21lLVN0YXRlMSEwHwYDVQQKDBhJbnRlcm5ldCBXaWRnaXRzIFB0eSBMdGQwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQCeqTNpVKxOKs+5Dg5mk4moaTSGNC9wJL3uWn+GHvX1rHDJrB+CRILmE5LG1rWu3j6R42INYbo25Ngdhf3mFh3uGrNBecw9G+HwVfL5FXDWE06mgyn/N/eGyNQshwZeyANkO9SlBf9YrpnMb8DTowNev4Jq9/C2kmQAKlRSqzj8iOODmJf4ToFtFiCK9vwNKQwkN5FZ368yelKTsFqZFzOhU/0FwiZdgFmRj2Rf6mkfLl0c08BzwNUatr+BFHGtrJKraIJVnwW9i9ZhE5hayJhhBBuzrodYez/eFkUT2y9z+SNSELxQREwwzI47/xes/geWGD+YKRxQrdKN9A6NTnxpJ3pVA9BJlkrhG1Z62D78R9akrKTzMceuk7oP9x2GLe+hfwhc5ghSaamj4Kv8eHIWbx0fdSj7jOxmEc+9//zn7EXP6p/l8DlBaLMgPKUerDVSEpg1X0cy4Fktf2sVI2+U+RVA2ju/bti59QX5wS47kPh6Uj0Bc1SXTQ3LLRj7alLx+C0cRk2BCyQ0nSm6zEWVZl+/Jl8hlcX64rWydW4uXY7drre61CfC6LmHJfl6x/UYKslEGc+vRRAJz6CQNX9fRRA9oSnvP7NuYWzMLygtS5DpkbbRwDI5nwiqVyJh2oSlymbgaDPT4y33/CdihbySf3DrDJiuod1n0LT31lt1BQIDAQABo1MwUTAdBgNVHQ4EFgQUTziMqC5C95vDbjrw4OJCKQUQkj4wHwYDVR0jBBgwFoAUTziMqC5C95vDbjrw4OJCKQUQkj4wDwYDVR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAgEAGmZu2qLK1MrGzVtn+rh3ZmxpxOqGxn+M00Mk5eGpSJ1Xg8UlvLUcUr969ORqbNkXIRvRnx3nUjO552IurrGr95Q1V/0r2KjvH1hmJd/CACnXPXoqxfAZd9n0/KhuAtnEiMQQF3Dap7At1S4uffwge/E6q6kUXR59n9kT9NUo0nL44ORv3slSWIKCoPGKNXjOmTBRJz2fJ5zknACjVC+3LX4mf/mMpFmLc8aWNs+jbJeltBqIbmTYAV3cLgv1G14bwC++vrQoQLeYCsyUkohpLRidcgv/A4Cuh4/ft1xBJa1wYAUZFr5RtyTeLOE9jWdZ9nUWoWDvw539tfJc2DJ2qJKgHOh/3JFa/JOJfSP1SuP+UhDWWZV6Oa9rHbCBE2nD3SobAzNgV5aw72ecZnrVUu9i+AGZ5fSyPdT5BluFWsCpcWxvv7qFqp0xrGY+p+TU+GK449gwrOjUjCvVk3UMxIGF/SY/npaE+5hPvuwnOFOKC886gXpj5LNxwWE7TtlyNKLIzQQ5im1veznhLlQlRwDrOmWOC1BThKXRyPt5rUn7r4Z8Rw48diZgDJ44VRD94q1Jcnu5kjJPuA2gtFj2oaf9VL/9sIDwdGfWsPeQUtzmtsw4S5W200RD85EC1SafvVz2XxummsP/B01pyTdR2rdp5X4K0ym4qV6lYUxPmw0="
                ],
                "permissions": [
                    {
                        "paths": [
                            "/"
                        ],
                        "methods": [
                            "GET"
                        ]
                    }
                ]
            }
        ]
    }
}

This is the complete admin block in my config.

The issue persists.

Note that once again, the /reverse_proxy/upstreams endpoint works. It's only /metrics which does not.

sushain@vesuvianite ~ ❯❯❯ curl -sf --cert /tmp/cert --key /tmp/key --resolve web.local.skc.name:8001:127.0.0.1 https://web.local.skc.name:8001/reverse_proxy/upstreams | jq | head
[
  {
    "address": "homebox.inventory.svc.cluster.local:80",
    "num_requests": 0,
    "fails": 0
  },
  {
    "address": "esphome.home.svc.cluster.local:80",
    "num_requests": 0,
    "fails": 0

sushain@vesuvianite ~ ❯❯❯ curl -v -fsS --cert /tmp/cert --key /tmp/key --resolve web.local.skc.name:8001:127.0.0.1 https://web.local.skc.name:8001/metrics
* Added web.local.skc.name:8001:127.0.0.1 to DNS cache
* Hostname web.local.skc.name was found in DNS cache
*   Trying 127.0.0.1:8001...
* Connected to web.local.skc.name (127.0.0.1) port 8001
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, CERT verify (15):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / X25519 / id-ecPublicKey
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: CN=web.local.skc.name
*  start date: Mar 25 21:37:40 2025 GMT
*  expire date: Jun 23 21:37:39 2025 GMT
*  subjectAltName: host "web.local.skc.name" matched cert's "web.local.skc.name"
*  issuer: C=US; O=Let's Encrypt; CN=E5
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA384
*   Certificate level 1: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
* using HTTP/1.x
> GET /metrics HTTP/1.1
> Host: web.local.skc.name:8001
> User-Agent: curl/8.5.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS alert, close notify (256):
* Empty reply from server
* Closing connection
* TLSv1.3 (OUT), TLS alert, close notify (256):
curl: (52) Empty reply from server

sushain97 avatar Apr 02 '25 05:04 sushain97

Gotcha. I'll give it a look.

mohammed90 avatar Apr 02 '25 05:04 mohammed90

@mohammed90 Just dropping a note here as I've done a bit of technical analysis on this issue as I was running into it on the latest main Caddy branch.

It would appear that this functionality breaks when the newAdminHandler() function gets called via the replaceRemoteAdminServer() call that executes during the final setup phase.

As you can see here, we New() up each module as an AdminRouter, but we never call Provision: https://github.com/caddyserver/caddy/blob/aa3d20be3ee451af9465470a28937690104e9422/admin.go#L272-L274

As a result, when the following code gets called, the metricsHandler and registry members are nil, resulting in an internal panic.

https://github.com/caddyserver/caddy/blob/aa3d20be3ee451af9465470a28937690104e9422/modules/metrics/adminmetrics.go#L64-L65

I've managed to hack around this by checking if each module is a Provisioner and calling Provision() accordingly within newAdminHandler()

	// register third-party module endpoints
	for _, m := range GetModules("admin.api") {
		mod := m.New()
		// If this mod is a provisioner, then provision it
		// before using it. This is a bit of a hack, but it
		// works for now. We can revisit this later if we
		// need to.
		if provisioner, ok := mod.(Provisioner); ok {
			err := provisioner.Provision(ctx)
			if err != nil {
				Log().Named("admin").Error("provisioning admin module", zap.Error(err))
				continue
			}
		}

		router := mod.(AdminRouter)
		for _, route := range router.Routes() {
			addRoute(route.Pattern, handlerLabel, route.Handler)
		}
		admin.routers = append(admin.routers, router)
	}

After these changes, I was able to hit my remote admin mTLS endpoint and grab /metrics without a problem. I'm new here, and I'd submit a PR, but I'm concerned that this is an ugly hack. So I'd love to see what other solutions there might be and learn accordingly.

Jimmy

Compy avatar May 03 '25 20:05 Compy

Thank you for the investigation! Please feel free to send the PR, @Compy, even if you feel it's hacky. We can iterate on it.

mohammed90 avatar May 03 '25 22:05 mohammed90

Thank you @Compy! And thank you for your ongoing sponsorship 😊

mholt avatar May 04 '25 00:05 mholt