vcert
vcert copied to clipboard
Provide the ability to reset the certificate object in Venafi TPP
BUSINESS PROBLEM If the downstream CA service is down for any reason, Venafi TPP changes the status of certificate object to Error.
The scenario to reproduce this is simple
cert-manager--->Venafi TPP--->MSCA
cert-manager
and MSCA
could be replaced with any consumer and provider.
- Request a certificate using
cert-manager
- Stop the MSCA service
- Trigger a certificate renewal using
cert-manager
- As expected the renewal fails because Venafi cannot reach the downstream
MSCA
service - Renewal fails with a proper error that can be seen in the
CertificateRequest
resource - Venafi TPP during it's attempt to reach MSCA fails and marks the status of the certificate object as "Error"
- Start the MSCA service
- Trigger a manual renewal again.
- This fails and there is no recovery. Unless the certificate object is
reset
any attempts to renew this certificate results in Error
PROPOSED SOLUTION vCert provides a mechanism to reset the certificate object so consumers can attempt to heal the situation.
CURRENT ALTERNATIVES Currently, the only way to recover is to manually reset the certificate object in the UI and retry a renewal via API.
Perhaps this could be exposed in the vcert
library, but also as a command line option as well?
E.g. a reset
sub command something like:
USAGE:
vcert [global options] command [command options] [arguments...]
ACTIONS:
gencsr To generate a certificate signing request (CSR)
enroll To enroll a certificate
pickup To retrieve a certificate
renew To renew a certificate
reset To reset a certificate state
revoke To revoke a certificate
Requiring either an id
to target which certificate to reset:
vcert reset -u <HOST> -t <TOKEN> --id <CERT_ID>
Thank you for raising this issue @sitaramkm. This problem is a side effect of TPP's object-based design and does not apply to Venafi as a Service. As such, that gives me reservations about adding a new action (or a new option to an existing action) as @hawksight proposed. Instead I think this "self-healing" should just be how VCert behaves. No certificate request should ever be influenced by the success or failure of any previous certificate request.
Guidance for anyone contributing this update to the project:
It is important we avoid introducing additional API calls for the majority case (i.e., where the request succeeds because there is no existing certificate object in error). That means not adding logic before every request to check whether a certificate object already exists and, if so, whether it is "in error". Instead the reset/retry logic should only be triggered by the POST /vedsdk/certificates/retrieve
failing with an HTTP 500 error response (it will have the following body after making an API request for a certificate object that was in error).
{
"Stage": 500,
"Status": "WebSDK CertRequest Module Requested Certificate"
}
That error confirms there was a certificate object in error state prior to the current request being made and it should trigger a POST /vedsdk/certificates/reset
call with "Restart": false
followed by repeating the POST /vedsdk/certificates/request
call with the original payload. This won't guarantee the certificate request will be successful but it will ensure that the current certificate request is always attempted.
This issue was fixed in vcert v4.23.0 (https://github.com/Venafi/vcert/pull/269). Regarding cert-manager, the issue will be fixed as part of 1.11 (https://github.com/cert-manager/cert-manager/pull/5674).
If you are hitting one of the two error messages:
unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 500 Certificate has encountered an error while processing, Status: WebSDK CertRequest Module Requested Certificate, Stage: 400.
or
unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 500 Certificate has encountered an error while processing, Status: This certificate cannot be processed while it is in an error state. Fix any errors, and then click Retry., Stage: 400.
then I recommend that you upgrade to vcert v4.23.0 (the stage number doesn't matter in the above messages).
This use case was addressed by v4.23.0
@luispresuelVenafi Could we re-open this issue? Although this issue was fixed in 4.23.0, it was then reverted in VCert 4.24.0. More context is available in https://github.com/Venafi/vcert/issues/273#issuecomment-1556938953.
@maelvls sure. This still a pending issue due to revert.
This has been partially fixed in VCert 5.0.0 with the introduction of the ResetCertificate
Go function (https://github.com/Venafi/vcert/pull/295).
No CLI command was added though (e.g., vcert reset
). I know that @hawksight talked about vcert reset
, is it still needed?