acme2certifier icon indicating copy to clipboard operation
acme2certifier copied to clipboard

acme.sh fails with "Sign error, wrong status" when a2c ca_server.get_cert() fails with error: The NETBIOS connection with the remote host timed out.

Open okorsky opened this issue 1 year ago • 8 comments
trafficstars

When using acme.sh with a2c and mswcce_ca_handler.py, there's a strange behavior that happens.

All VMs (a2c, acme.sh, acme-dns, MS CA and domain controller) are all in one network and have direct access. in this environment it's on Azure VMs. But I also reproduced the same with VMs running on VMware workstation.

  • a2c finishes the validation and tries to enroll for a certificate from MS CA using mswcce_ca_handler
  • the certificate gets issued by the MS CA, but a2c shows NETBIOS connection error when it's trying to get the certificate
  • acme.sh ends with error Sign error, wrong status
  • acme.sh tries again with a new request and all goes smoothly

any idea what could be the main issue here? is it possible to re-try to pull the certificate from the MS CA after the NETBIOS failure?

Logs from both acme.sh and a2c are attached here. a2clog_amcesh.txt acmeshlog.txt

okorsky avatar Apr 15 '24 19:04 okorsky

I see this error from time to time in my lab as well. In the past I thought its related to my setup (i am accessing the MS CA via ssh tunnel) but it does not seem to be the case.

I will look into it during the upcoming days however are you saying that the same setup works fine with other acme clients? Is it a permanent issue when using acme-sh?

grindsa avatar Apr 18 '24 05:04 grindsa

I normally test with acme.sh so I've seen this error mostly when using acme.sh.

however, there was one instance where it also happened with cert-manager.

okorsky avatar Apr 29 '24 12:04 okorsky

the error appears more and more now, from both cert-manager and acme.sh.

any idea yet?

okorsky avatar May 08 '24 09:05 okorsky

It's unlikely that the choice of ACME client would affect a server side NETBIOS connection. The error in question is being thrown by the impacket library: https://github.com/fortra/impacket/blob/master/impacket/nmb.py#L285 however the wording is their default for that exception type and the point where it occurs will vary.

Beware cached name resolution if your target machines IP address will change.

webprofusion-chrisc avatar May 08 '24 10:05 webprofusion-chrisc

@webprofusion-chrisc I agree, I was just answering the previous question if the error occurs from different clients.

I'm not sure if the cached name resolution is the issue here. the target machines (both DC and ADCS CA) have static IP addresses.

Also it's worth noting:

  • on ADCS CA, the certificate gets issued (so the request does reach the CA)
  • right after the error, when the client tries again (within seconds) it gets a cert successfully

I was thinking to have a temporary workaround to read the error in the exception at https://github.com/grindsa/acme2certifier/blob/bc9deb61aa8d414b3e33298f08a0fc01555f0d4d/examples/ca_handler/mswcce_ca_handler.py#L241-L243 and if the error is "the NETBIOS connection with the remote host timed out" then to simply try the cert_raw = convert_byte_to_string(request.get_cert(convert_string_to_byte(csr))) again before failing.

okorsky avatar May 08 '24 10:05 okorsky

the workaround didn't work after all.

I tried to catch the error, sleep for 2 seconds then try building the request again request = self.request_create(), but the same error showed at the retry.

okorsky avatar May 13 '24 22:05 okorsky

Hi,

Sorry for not commenting earlier but i was quite busy the last few weeks.

I agree that the error is most likely not related to the acme-client. The reason for asking is that I am looking for a reliable way to replicate the issue.

Let me give it another try over the weekend.

/G.

grindsa avatar May 17 '24 04:05 grindsa

Hi,

Sorry, I am still not able to replicate the issue. However, its worth to try if increasing the timeout of the dce-connection will help you to overcome the issue. Default is 5 seconds, maybe a higher value works better in your environment.

I updated the handler and introduced an timeout option in acme_srv.cfg to make the timeout configurable.

[CAhandler]
...
timeout: 20

Please give it a try with the updated handler) and check if things get better.

grindsa avatar May 19 '24 07:05 grindsa

Closed due to inactivity.... In case you would like to follow up please re-open....

grindsa avatar Jun 21 '24 08:06 grindsa

I apologize for the late response.

actually the solution with the timeout provided seems to have fixed the problem. I applied it a few weeks ago and sense then the problem didn't occur again.

Thank you so much for the fix :)

okorsky avatar Jun 21 '24 11:06 okorsky