vcert icon indicating copy to clipboard operation
vcert copied to clipboard

Provide the ability to reset the certificate object in Venafi TPP

Open sitaramkm opened this issue 2 years ago • 7 comments

BUSINESS PROBLEM If the downstream CA service is down for any reason, Venafi TPP changes the status of certificate object to Error.

The scenario to reproduce this is simple cert-manager--->Venafi TPP--->MSCA

cert-manager and MSCA could be replaced with any consumer and provider.

  • Request a certificate using cert-manager
  • Stop the MSCA service
  • Trigger a certificate renewal using cert-manager
  • As expected the renewal fails because Venafi cannot reach the downstream MSCA service
  • Renewal fails with a proper error that can be seen in the CertificateRequest resource
  • Venafi TPP during it's attempt to reach MSCA fails and marks the status of the certificate object as "Error"
  • Start the MSCA service
  • Trigger a manual renewal again.
  • This fails and there is no recovery. Unless the certificate object is reset any attempts to renew this certificate results in Error

PROPOSED SOLUTION vCert provides a mechanism to reset the certificate object so consumers can attempt to heal the situation.

CURRENT ALTERNATIVES Currently, the only way to recover is to manually reset the certificate object in the UI and retry a renewal via API.

sitaramkm avatar Jul 06 '22 18:07 sitaramkm

Perhaps this could be exposed in the vcert library, but also as a command line option as well?

E.g. a reset sub command something like:

USAGE:
   vcert [global options] command [command options] [arguments...]
   
ACTIONS:

   gencsr       To generate a certificate signing request (CSR)
   enroll       To enroll a certificate
   pickup       To retrieve a certificate
   renew        To renew a certificate
   reset        To reset a certificate state
   revoke       To revoke a certificate

Requiring either an id to target which certificate to reset:

vcert reset -u <HOST> -t <TOKEN> --id <CERT_ID> 

hawksight avatar Jul 19 '22 14:07 hawksight

Thank you for raising this issue @sitaramkm. This problem is a side effect of TPP's object-based design and does not apply to Venafi as a Service. As such, that gives me reservations about adding a new action (or a new option to an existing action) as @hawksight proposed. Instead I think this "self-healing" should just be how VCert behaves. No certificate request should ever be influenced by the success or failure of any previous certificate request.

Guidance for anyone contributing this update to the project:

It is important we avoid introducing additional API calls for the majority case (i.e., where the request succeeds because there is no existing certificate object in error). That means not adding logic before every request to check whether a certificate object already exists and, if so, whether it is "in error". Instead the reset/retry logic should only be triggered by the POST /vedsdk/certificates/retrieve failing with an HTTP 500 error response (it will have the following body after making an API request for a certificate object that was in error).

{
 "Stage": 500, 
 "Status": "WebSDK CertRequest Module Requested Certificate"
}

That error confirms there was a certificate object in error state prior to the current request being made and it should trigger a POST /vedsdk/certificates/reset call with "Restart": false followed by repeating the POST /vedsdk/certificates/request call with the original payload. This won't guarantee the certificate request will be successful but it will ensure that the current certificate request is always attempted.

tr1ck3r avatar Sep 09 '22 15:09 tr1ck3r

This issue was fixed in vcert v4.23.0 (https://github.com/Venafi/vcert/pull/269). Regarding cert-manager, the issue will be fixed as part of 1.11 (https://github.com/cert-manager/cert-manager/pull/5674).

If you are hitting one of the two error messages:

unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 500 Certificate has encountered an error while processing, Status: WebSDK CertRequest Module Requested Certificate, Stage: 400.

or

unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 500 Certificate has encountered an error while processing, Status: This certificate cannot be processed while it is in an error state. Fix any errors, and then click Retry., Stage: 400.

then I recommend that you upgrade to vcert v4.23.0 (the stage number doesn't matter in the above messages).

maelvls avatar Jan 02 '23 12:01 maelvls

This use case was addressed by v4.23.0

luispresuelVenafi avatar Jan 02 '23 18:01 luispresuelVenafi

@luispresuelVenafi Could we re-open this issue? Although this issue was fixed in 4.23.0, it was then reverted in VCert 4.24.0. More context is available in https://github.com/Venafi/vcert/issues/273#issuecomment-1556938953.

maelvls avatar May 22 '23 09:05 maelvls

@maelvls sure. This still a pending issue due to revert.

luispresuelVenafi avatar May 22 '23 14:05 luispresuelVenafi

This has been partially fixed in VCert 5.0.0 with the introduction of the ResetCertificate Go function (https://github.com/Venafi/vcert/pull/295).

No CLI command was added though (e.g., vcert reset). I know that @hawksight talked about vcert reset, is it still needed?

maelvls avatar Oct 06 '23 12:10 maelvls