vcert icon indicating copy to clipboard operation
vcert copied to clipboard

When retrieving a certificate with timeout the underlying error is hidden

Open barucoh opened this issue 3 years ago • 2 comments

PROBLEM SUMMARY When retrieving a certificate with timeout set to anything other than 0 the returned error does not include the status code or the actual errors that occurred when trying to retrieve the certificate. All we get is an "Operation timed out" and for all we know this could just mean a "slow network", but in the case there were at least more than 1 retries we need to know all the errors that occurred during every retry. More specifically, every retry attempt we need to know the actual underlying error with its status code (if any).

This will greatly help troubleshooting errors

STEPS TO REPRODUCE

  • Create a certificate
  • Try to retrieve that certificate with timeout > 0 (Make sure it is impossible to actually retrieve the certificate so the request to retrieve it fails)

EXPECTED RESULTS Seeing a number of errors as the number of retries, each with it's status code

ACTUAL RESULTS Just seeing a plain "Operation timed out" error

ENVIRONMENT DETAILS

COMMENTS/WORKAROUNDS

barucoh avatar May 12 '21 12:05 barucoh

Hi @barucoh, it sounds like the feature you may be seeking is provided by the VCert CLI's --verbose parameter. When specified it is intended to show the individual REST API calls being made and their associated results. For the following example, I made the request with the Venafi service stopped and then started it so there would be a few failed retrieval attempts:

vcert enroll -u https://tpp.venafi.example -t tn1PwE1QTZorXmvnTowSyA== -z DevOps --cn vcert-cli.venafi.example --san-dns vcert-cli.venafi.example --no-prompt --verbose`
vCert: 2021/05/12 21:13:53 Successfully connected to TPP
vCert: 2021/05/12 21:13:53 Got 200 OK status for POST https://tpp.venafi.example/vedsdk/certificates/checkpolicy
vCert: 2021/05/12 21:13:53 Successfully read zone configuration for DevOps
vCert: 2021/05/12 21:13:53 Successfully created request for vcert-cli.venafi.example
vCert: 2021/05/12 21:13:54 Got 200 OK status for POST https://tpp.venafi.example/vedsdk/certificates/request
vCert: 2021/05/12 21:13:55 Got 200 OK status for POST https://tpp.venafi.example/vedsdk/metadata/get
vCert: 2021/05/12 21:13:55 Successfully posted request for vcert-cli.venafi.example, will pick up by \VED\Policy\DevOps\vcert-cli.venafi.example
vCert: 2021/05/12 21:13:55 Got 202 Certificate \VED\Policy\DevOps\vcert-cli.venafi.example is queued for processing. status for POST https://tpp.venafi.example/vedsdk/certificates/retrieve
vCert: 2021/05/12 21:13:55 Issuance of certificate is pending...
vCert: 2021/05/12 21:14:00 Got 202 Certificate \VED\Policy\DevOps\vcert-cli.venafi.example is queued for processing. status for POST https://tpp.venafi.example/vedsdk/certificates/retrieve
vCert: 2021/05/12 21:14:00 Issuance of certificate is pending...
vCert: 2021/05/12 21:14:06 Got 202 Certificate \VED\Policy\DevOps\vcert-cli.venafi.example is queued for processing. status for POST https://tpp.venafi.example/vedsdk/certificates/retrieve
vCert: 2021/05/12 21:14:06 Issuance of certificate is pending...
vCert: 2021/05/12 21:14:11 Got 200 OK status for POST https://tpp.venafi.example/vedsdk/certificates/retrieve
vCert: 2021/05/12 21:14:11 Successfully retrieved request for \VED\Policy\DevOps\vcert-cli.venafi.example

The --timeout parameter is used to specify the amount of time that VCert will attempt to retrieve the certificate after being submitted since not all CAs are created equally and some issue certificates much faster than others. So the "operation timed out" result refers to VCert not being able to retrieve the certificate within the allowed timeframe (3 minutes by default) not anything to do with connections timing out. If you're regularly seeing connection timeouts and the network latency between the Venafi API server and the system where you're running VCert is <100ms, it might be worth opening a case with Venafi Customer Support to determine whether there is something causing the API server to be unstable or underperform.

If your connectivity is unreliable you might also want to consider doing things asynchronously by using the enroll action's --no-pickup option followed by your own retry logic that invokes the pickup action. That would allow you to rely on non-zero exit codes from the command to tell you whether the action was unsuccessful.

Does that help?

tr1ck3r avatar May 13 '21 04:05 tr1ck3r

@tr1ck3r I'm actually not talking about the CLI but the actual SDK. I have a GO client which uses the vCert SDK and when I use the RetrieveCertificate under the Connector interface, with the following struct as an input:

certificate.Request{
    PickupID: some-id,
    Timeout: time.Second * 20,
}

Then, no matter what the underlying error is, I will never see it because of this specific code - https://github.com/Venafi/vcert/blob/master/pkg/venafi/tpp/connector.go#L996-L1017 If a Timeout is set for the certificate.Request than the error is never returned to the user no matter what it is, it simply returns "Operation timed out"

barucoh avatar May 23 '21 14:05 barucoh