Setting default_sni to a value that won't generate a cert can cause memory exhaustion
If default_sni is set to a value that can't generate a certificate, it seems Caddy will create many instances of "getCertDuringHandshake" whenever a connection with no valid SNI is made, this never seem to finish and cause increased memory growth until eventually killed by the orchestrator.
This was a mis-configuration on my part, I mistakenly deployed a change that led to the HOSTNAME env variable, usually set to the load balancer hostname, becoming set to the docker container id.
{
default_sni {$HOSTNAME}
}
With debug logging enabled, the log is flooded with this message.
{"level":"debug","ts":1739193344.6278892,"logger":"tls.handshake","msg":"no matching certificates and no custom selection logic","identifier":"<lb ip>"}
Version:
Caddy v2.9.1
Modules:
caddy.storage.consul
dns.providers.cloudflare
dns.providers.powerdns
supervisor
Hmm, I'm not really sure what to do about this though. We can't know you made a mistake like that, I don't think...
I'm also not really sure what that graph is. What am I looking at? 85 MB memory usage? That's pretty normal when there's traffic. Is there an actual leak?
The graph shows the memory usage of getCertDuringHandshake (please correct me if I'm interpreting it wrong?) The number there matched up with the memory usage of the Caddy process when the pprof dump was taken.
Perhaps Caddy could exit if default_sni does not work or throw a warning?
I see, so that one function is using 9.8 GB instantaneously (i.e. not cumulatively)?
That does seem like a problem... could you grab a profile? https://caddyserver.com/docs/profiling -- heap and goroutine dump would be useful I think.
(Did you mean to close this?)
I have the matching heap and goroutine dump.
Are they safe to post?
Edit: Nope, didn't mean to close, not sure how I managed that.
Yeah, profiles are technically safe to share.
Reopened, I had a Github helper extension that is breaking the issue page, my bad.
Thanks, I'll take a look!