envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Emit metric for seconds since UNIX epoch of expiration for loaded certs

Open PeterL328 opened this issue 2 years ago • 89 comments

Commit Message: Emit metric for seconds since UNIX epoch of expiration for loaded certs Additional Description:

Introducing TLS/CA certificate metrics for seconds since epoch of expiration.

New metric with format:

"cluster.<cluster_name>.ssl.certificate.<cert_name>.expiration_unix_time_seconds" [GAUGE]
"listener.<address>.ssl.certificate.<cert_name>.expiration_unix_time_seconds" [GAUGE]

CA certificates & TLS certificates Dynamic certs (from SDS) -> cert name will be cert name from SDS Static certs (from file-path) -> cert name will be the secret name. Inline certs (from inline certs in configuration) -> name will be prefixed with "unnamed_certs_" + first 8 chars of the hash of the cert. Inline ca cert -> name will be prefixed with "unnamed_ca_cert_" + first 8 chars of the hash of the cert.

Risk Level: Low Testing: Unit/Integration tests Docs Changes: Included Release Notes: Included Platform Specific Features: [Optional Runtime guard:] N/A [Optional Fixes #Issue] #27396 [Optional Fixes commit #PR or SHA] [Optional Deprecated:] N/A [Optional API Considerations:]

PeterL328 avatar May 30 '23 09:05 PeterL328

/retest

PeterL328 avatar Jun 17 '23 04:06 PeterL328

@ggreenway Do you mind taking a look at my PR? 😄

PeterL328 avatar Jun 17 '23 06:06 PeterL328

/assign-from @envoyproxy/first-pass-reviewers

zuercher avatar Jun 21 '23 22:06 zuercher

@envoyproxy/first-pass-reviewers assignee is @silverstar194

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/27687#issuecomment-1601761407 was created by @zuercher.

see: more, trace.

/retest

PeterL328 avatar Jun 23 '23 00:06 PeterL328

/retest

PeterL328 avatar Jun 23 '23 01:06 PeterL328

/retest

PeterL328 avatar Jun 23 '23 08:06 PeterL328

/retest

PeterL328 avatar Jun 24 '23 08:06 PeterL328

/retest

PeterL328 avatar Jun 24 '23 10:06 PeterL328

/retest

PeterL328 avatar Jun 24 '23 10:06 PeterL328

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch). envoyproxy/dependency-shepherds assignee is @RyanTheOptimist

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/27687 was synchronize by PeterL328.

see: more, trace.

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch). envoyproxy/dependency-shepherds assignee is @RyanTheOptimist

🐱

sorry, I accidentally did a git merge upstream/main which pulled in several other changes. I reverted those.

PeterL328 avatar Jun 24 '23 11:06 PeterL328

/retest

PeterL328 avatar Jun 24 '23 12:06 PeterL328

/retest

PeterL328 avatar Jun 24 '23 21:06 PeterL328

Seems like getting an unrelated error from clang-tidy. I'm not sure how to fix it but was able to get green CI from retest though takes a few tries.

PeterL328 avatar Jun 25 '23 03:06 PeterL328

/retest

PeterL328 avatar Jun 25 '23 03:06 PeterL328

clang-tidy error in CI shows the following:

Error while processing /source/envoy/ssl/certificate_validation_context_config.h.

/build/bazel_root/base/execroot/envoy/./source/common/common/base_logger.h:7:10: error: 'spdlog/spdlog.h' file not found [clang-diagnostic-error]
#include "spdlog/spdlog.h"
.
.
.

I did add some code in certificate_validation_context_config.h but not sure how it relates with base_logger.h and not sure why 'spdlog/spdlog.h' file not found

Will ignore that failure for now.

PeterL328 avatar Jun 25 '23 04:06 PeterL328

clang-tidy error in CI shows the following:

Error while processing /source/envoy/ssl/certificate_validation_context_config.h.

/build/bazel_root/base/execroot/envoy/./source/common/common/base_logger.h:7:10: error: 'spdlog/spdlog.h' file not found [clang-diagnostic-error]
#include "spdlog/spdlog.h"
.
.
.

I did add some code in certificate_validation_context_config.h but not sure how it relates with base_logger.h and not sure why 'spdlog/spdlog.h' file not found

Will ignore that failure for now.

Yeah this is a known occasional failure mode in tidy. :/ Ignoring it is the right choice, alas.

RyanTheOptimist avatar Jun 26 '23 18:06 RyanTheOptimist

/assign @ggreenway since this is TLS related

RyanTheOptimist avatar Jun 26 '23 18:06 RyanTheOptimist

RE tidy not finding spdlog.h: I believe I've seen this before. The cause is BUILD files that don't fully list required dependencies. Normal compiles work because of transitive dependencies. But, for reasons I don't understand, clang-tidy is stricter. So a potential remedy is adding direct BUILD dependencies as indicated by the detailed tidy logs.

jmarantz avatar Jun 30 '23 11:06 jmarantz

fyi, going on a short vacation and be back to work on this end of this month. @ggreenway

PeterL328 avatar Jul 07 '23 19:07 PeterL328

fyi, going on a short vacation and be back to work on this end of this month. @ggreenway

Sounds good; ping me when you get back and this is ready for review. And if stale-bot closes it I'll re-open.

/wait

ggreenway avatar Jul 10 '23 15:07 ggreenway

Looks like there are merge conflicts, and DCO needs to be fixed. /wait

RyanTheOptimist avatar Jul 31 '23 13:07 RyanTheOptimist

Addressed feedback. Ready for another round of review :grin:

PeterL328 avatar Aug 01 '23 11:08 PeterL328

/retest

PeterL328 avatar Aug 01 '23 13:08 PeterL328

/retest

PeterL328 avatar Aug 01 '23 18:08 PeterL328

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Sep 02 '23 00:09 github-actions[bot]

Commenting to bring out of stale. Will work on it next week. Had some other high priority work last few weeks so wasn't able to work on it.

PeterL328 avatar Sep 02 '23 00:09 PeterL328

Sorry for delay, @ggreenway I addressed your feedback. Let me know what you think. :)

PeterL328 avatar Sep 20 '23 00:09 PeterL328

One more high-level question: instead of seconds-until-expiration, what if we published it as the time in seconds-since-unix-epoch of the expiration. This has two advantages: it's not constantly changing and taking up more space in a TSDB, and envoy doesn't have to keep updating the value (it gets set once when we load the cert and read it's expiration time). I know that makes it different than the existing envoy metric for time until next cert expiration, but seems worth considering.

@ggreenway Seems if we switch to seconds-since-unix-epoch of the expiration it would be similar to viewing the certs from the /certs endpoint? I think our use case is we want to have alerts setup that points to specific certs to notify if the certs are expiring soon. Do you think we can move forward with current approach and we can iterate on it in next PRs if we want to tweak it?

PeterL328 avatar Sep 21 '23 10:09 PeterL328