certificates Changing ACME server host or port causes existing accounts to craft malformed requests

Subject of the issue

Please note, I am not sure if this is a bug. It is just unexpected behavior I found after wasting a little of Max Furman's time (sorry about that) troubleshooting #193.

Basically, the kid used with JWS contain the scheme, host, port, and path (i.e. https://localhost:9000/acme/myca/account/...) from the ACME server's configuration at the time.

However, if the ACME server changes any part of the address, then the kid prefix check fails here: https://github.com/smallstep/certificates/blob/6476eb45a7e97ac43a37da7ab4893555625de249/acme/api/middleware.go#L324-L331

Steps to reproduce

Create an ACME account with the server at one URL
Move the ACME server to a new URL (any of scheme, host, port, or path).
Try another ACME transaction: urn:ietf:params:acme:error:malformed - The request message was malformed

Expected behaviour

I am not sure. Maybe that error is expected.

Actual behaviour

urn:ietf:params:acme:error:malformed - The request message was malformed

Additional context

I do think this is unexpected, especially with internal/development ACME servers, since in my case I originally had the account created at localhost:9000 then set up a reverse proxy for the ACME server at localhost:9001 and started pointing the client to use that one, but reuse the same account. Since the client stores the account kid from when it created the account, it retained the localhost:9000 value.

Essentially, this means the ACME server can never change its address after it is being used, unless new accounts are created or clients change their stored account locations to reflect the new server address. ~~Maybe that is the best way to handle it.~~ Edit: RFC 8555 suggests otherwise, see discussion below.

I am not familar enough with JWK/JWS to know the value of the prefix check, but I assume it wasn't invented for nothing. So this might very well be on the client to fix.

(Incidentally, setting STEPDEBUG=1 was not enough to emit detailed error messages. I had to add my own debug logging to find this.)

Again, I am not sure if this is a server bug. Wanted to open a discussion anyway.

Jul 01 '21 19:07 mholt

Thanks @mholt. This is a fun one.

Spent sometime further debugging this with @mholt and I'm just going to update the workflow / progression of the unexpected behavior with our findings.

When you create an ACME account most clients will store the account_url with that accounts metadata. The account_url has the CA_URL inside of it, e.g. https://localhost:9001/acme/acme/a ccount/DLpad8wWi87rFMG3Qe8VsuLLBNxVga3y, and it's also 1:1 associated with the CA hostname. So, if the CA hostname changes (either because the user is using a proxy, or because the hostname actually changes) most cli ACME clients will create a new account because they consider this new ACME hostname a completely different CA. IF you try to use an ACME account created with an old hostname to make new ACME transactions (e.g. create a new order), the request will fail when comparing the kid sent in the request with the current hostname (because your ACME client will send the original account_url which will no longer match).

Reasons to want to use the old account vs. just creating a new one when the hostname changes:

there may be metadata associated with the account e.g. rate limiting, EAB, etc.
clients that try to reuse an existing account despite the hostname change.

A possible fix here would be to store the original account_url and compare it to the incoming kid. This solution would more closely align with the spec which states:

For all other requests, the request is signed using an existing
   account, and there MUST be a "kid" field.  This field MUST contain
   the account URL received by POSTing to the newAccount resource.

https://datatracker.ietf.org/doc/html/rfc8555#section-6.2.

Jul 01 '21 20:07 dopey

we had the same issue. had to delete and then register a new account using the acme client that the end user was using (in our case acme.sh)

Aug 30 '21 16:08 TheSecMaven

Good to know that this is affecting multiple users. Will help with prioritization.

Aug 30 '21 20:08 dopey

Is something going on with it? In my case amce is deployed as a K8S POD, so imagine what happens with all cert-manager issuers when such a pod is being restarted (due to standard autoscalling jobs, or other thing)? Every client (issuer in cert-manager terminology) must be restarted (in my case removed and rbeing recreated)?

Dec 22 '22 14:12 MaciekLeks

Is something going on with it? In my case amce is deployed as a K8S POD, so imagine what happens with all cert-manager issuers when such a pod is being restarted (due to standard autoscalling jobs, or other thing)? Every client (issuer in cert-manager terminology) must be restarted (in my case removed and rbeing recreated)?

In my case: The bug was on k8s side - acme pod did not persist data (db directory of smallstep acme server) between restarts. Remember: ACME server holds client(cert-manager's issuer) data between restarts - not the client (cert-manager's issuer).

Jan 04 '23 10:01 MaciekLeks

Would just like to pop in and see if this is scheduled to be discussed or worked on at all.

Just had a report of a related issue that I think is the same underlying cause: they are proxying the ACME server and the responses from the ACME server all write out the "wrong" addresses because it doesn't know it is being proxied. It seems like being able to configure the "external" host -- or better yet, read it from the Host or X-Forwarded-Host header -- would be useful!

Maybe it is slightly different, but it reminded me of this issue anyway.

Mar 07 '23 22:03 mholt

We'll discuss it again tomorrow morning and I'll update back here with prioritization.

Mar 07 '23 22:03 dopey

Cool, thanks. I'm willing to help out if needed. For example maybe it's some new configuration parameter, I am totally down to integrate with that. Thanks again for these libs!

Mar 07 '23 22:03 mholt

Hey @mholt sorry for the delay. We had a lot of open source issues / PRs this week and it took some extra time to get through the triage.

It's been a while since I reviewed this issue, but based on my memory the issue is that I stored the original host/port value as part of the account details. Then, if/when those don't match we fail the request. Unfortunately fixing this will probably mean updating the code in a few places and fixing a bunch of broken tests. I'm probably best suited to do that , and I also made the mess so I'll clean it up.

I've got some product work that needs to get done ASAP for the next week to week and a half, but I should be able to dig into this after.

Mar 09 '23 19:03 dopey

@dopey Oh no problem. This is not really urgent; I was just reminded of it the other day.

Totally understand what it's like to fix something like this -- I'm about to take a deep breath and work on something similar with Caddy that needs fixing, heh.

I think you do great work! Thanks again!

Mar 10 '23 16:03 mholt

certificates certificates copied to clipboard

Changing ACME server host or port causes existing accounts to craft malformed requests

Subject of the issue

Steps to reproduce

Expected behaviour

Actual behaviour

Additional context

certificates
certificates copied to clipboard