certmagic icon indicating copy to clipboard operation
certmagic copied to clipboard

Decouple cleanup context from obtaining context

Open aplr opened this issue 1 year ago • 2 comments

What version of the package are you using?

v0.21.4

What are you trying to do?

Cleaning up after a failed obtain attempt

What steps did you take?

Configured Certmagic to issue certificates on-demand

What did you expect to happen, and what actually happened instead?

Obtaining an on-demand certificate runs into the configured timeout of 180s:

https://github.com/caddyserver/certmagic/blob/c783cbd36b24edd2e94d0af1e9c1894d8619db68/handshake.go#L524-L526

If that happens, I expect the cleanup routines to succeed.

https://github.com/mholt/acmez/blob/527e47cae3f84fa3a92d1d9b9c21c5eb0b44359a/client.go#L316-L323

https://github.com/mholt/acmez/blob/527e47cae3f84fa3a92d1d9b9c21c5eb0b44359a/client.go#L337

However, the cleanup routines are bound to the cancelled context, and won't run as the context was cancelled.

Show error logs
{
  "insertId": "19ucu8ye3zmtq",
  "jsonPayload": {
    "msg": "cleaning up solver",
    "challenge_type": "http-01",
    "level": "error",
    "ts": 1729163825.755899,
    "identifier": "redacted.com",
    "caller": "[email protected]/client.go:544",
    "time": "2024-10-17T11:17:05.75616761Z",
    "stacktrace": "github.com/mholt/acmez/v2.(*Client).pollAuthorization\n\t/go/pkg/mod/github.com/mholt/acmez/[email protected]/client.go:544\ngithub.com/mholt/acmez/v2.(*Client).solveChallenges\n\t/go/pkg/mod/github.com/mholt/acmez/[email protected]/client.go:378\ngithub.com/mholt/acmez/v2.(*Client).ObtainCertificate\n\t/go/pkg/mod/github.com/mholt/acmez/[email protected]/client.go:136\ngithub.com/caddyserver/certmagic.(*ACMEIssuer).doIssue\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/acmeissuer.go:477\ngithub.com/caddyserver/certmagic.(*ACMEIssuer).Issue\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/acmeissuer.go:371\ngithub.com/aplr/redacted/tlsx.(*conditionalIssuer).Issue\n\t/app/tlsx/issuer.go:111\ngithub.com/caddyserver/certmagic.(*Config).obtainCert.func2\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/config.go:626\ngithub.com/caddyserver/certmagic.doWithRetry\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/async.go:104\ngithub.com/caddyserver/certmagic.(*Config).obtainCert\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/config.go:700\ngithub.com/caddyserver/certmagic.(*Config).ObtainCertAsync\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/config.go:505\ngithub.com/caddyserver/certmagic.(*Config).obtainOnDemandCertificate\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/handshake.go:531\ngithub.com/caddyserver/certmagic.(*Config).getCertDuringHandshake\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/handshake.go:369\ngithub.com/caddyserver/certmagic.(*Config).GetCertificateWithContext\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/handshake.go:92\ngithub.com/caddyserver/certmagic.(*Config).GetCertificate\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/handshake.go:49\ncrypto/tls.(*Config).getCertificate\n\t/usr/local/go/src/crypto/tls/common.go:1196\ncrypto/tls.(*serverHandshakeStateTLS13).pickCertificate\n\t/usr/local/go/src/crypto/tls/handshake_server_tls13.go:475\ncrypto/tls.(*serverHandshakeStateTLS13).handshake\n\t/usr/local/go/src/crypto/tls/handshake_server_tls13.go:61\ncrypto/tls.(*Conn).serverHandshake\n\t/usr/local/go/src/crypto/tls/handshake_server.go:54\ncrypto/tls.(*Conn).handshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1568\ncrypto/tls.(*Conn).HandshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1508\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1971",
    "env": "production",
    "cos.googleapis.com/stream": "stderr",
    "error": "deleting object acme/acme.zerossl.com-v2-dv90/challenge_tokens/redacted.com.json: context deadline exceeded",
    "logger": "app.https.tls.certmagic.acme_issuer.acme_client",
  }
}
{
  "insertId": "19ucu8ye3zmts",
  "jsonPayload": {
    "env": "production",
    "stacktrace": "github.com/mholt/acmez/v2.(*Client).solveChallenges.func1\n\t/go/pkg/mod/github.com/mholt/acmez/[email protected]/client.go:340\ngithub.com/mholt/acmez/v2.(*Client).solveChallenges\n\t/go/pkg/mod/github.com/mholt/acmez/[email protected]/client.go:380\ngithub.com/mholt/acmez/v2.(*Client).ObtainCertificate\n\t/go/pkg/mod/github.com/mholt/acmez/[email protected]/client.go:136\ngithub.com/caddyserver/certmagic.(*ACMEIssuer).doIssue\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/acmeissuer.go:477\ngithub.com/caddyserver/certmagic.(*ACMEIssuer).Issue\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/acmeissuer.go:371\ngithub.com/aplr/redacted/tlsx.(*conditionalIssuer).Issue\n\t/app/tlsx/issuer.go:111\ngithub.com/caddyserver/certmagic.(*Config).obtainCert.func2\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/config.go:626\ngithub.com/caddyserver/certmagic.doWithRetry\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/async.go:104\ngithub.com/caddyserver/certmagic.(*Config).obtainCert\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/config.go:700\ngithub.com/caddyserver/certmagic.(*Config).ObtainCertAsync\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/config.go:505\ngithub.com/caddyserver/certmagic.(*Config).obtainOnDemandCertificate\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/handshake.go:531\ngithub.com/caddyserver/certmagic.(*Config).getCertDuringHandshake\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/handshake.go:369\ngithub.com/caddyserver/certmagic.(*Config).GetCertificateWithContext\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/handshake.go:92\ngithub.com/caddyserver/certmagic.(*Config).GetCertificate\n\t/go/pkg/mod/github.com/caddyserver/[email protected]/handshake.go:49\ncrypto/tls.(*Config).getCertificate\n\t/usr/local/go/src/crypto/tls/common.go:1196\ncrypto/tls.(*serverHandshakeStateTLS13).pickCertificate\n\t/usr/local/go/src/crypto/tls/handshake_server_tls13.go:475\ncrypto/tls.(*serverHandshakeStateTLS13).handshake\n\t/usr/local/go/src/crypto/tls/handshake_server_tls13.go:61\ncrypto/tls.(*Conn).serverHandshake\n\t/usr/local/go/src/crypto/tls/handshake_server.go:54\ncrypto/tls.(*Conn).handshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1568\ncrypto/tls.(*Conn).HandshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1508\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1971",
    "ts": 1729163825.757464,
    "identifier": "redacted.com",
    "caller": "[email protected]/client.go:340",
    "level": "error",
    "time": "2024-10-17T11:17:05.757641208Z",
    "cos.googleapis.com/stream": "stderr",
    "authz": "https://acme.zerossl.com/v2/DV90/authz/redacted",
    "logger": "app.https.tls.certmagic.acme_issuer.acme_client",
    "msg": "deactivating authorization",
    "error": "attempt 1: https://acme.zerossl.com/v2/DV90/authz/redacted: context deadline exceeded",
  },
}

How do you think this should be fixed?

Allow to provide a long-running context for cleanup routines, which is passed to acmez, if this makes sense.

Bonus: What do you use CertMagic for, and do you find it useful?

Creating on-demand certificates for parked domains

aplr avatar Oct 17 '24 11:10 aplr

Ah, yeah, that's tricky. Cleanup probably needs to be given a separate, uncanceled context, but we have to really trust that it won't take long to run. Maybe we could pass in a context with a new cancel (like a new timeout, but long enough).

mholt avatar Oct 18 '24 17:10 mholt

My idea was to allow passing a custom context to acmez.Client, like a context for background tasks:

https://github.com/mholt/acmez/blob/527e47cae3f84fa3a92d1d9b9c21c5eb0b44359a/client.go#L52-L57

If no custom context is provided, it will just fall back to the current behaviour.

This would be pulled through to certmagic as well, where a custom context can be passed to the ACMEIssuer:

https://github.com/caddyserver/certmagic/blob/65cf36cce160babd154a5c77fe266fdcb444978f/acmeissuer.go#L43-L157

Then, when creating the acmez.Client, the context can just be set:

https://github.com/caddyserver/certmagic/blob/65cf36cce160babd154a5c77fe266fdcb444978f/acmeclient.go#L258-L282

I think then it is just about documenting this feature so users understand the consequence, and that one has to cancel this context on e.g. application shutdown.

If you don't want to have the API changed, however, I think a new context with a sane (possibly configurable) timeout is good as well.

aplr avatar Oct 24 '24 13:10 aplr