envoy
envoy copied to clipboard
Emit metric for seconds since UNIX epoch of expiration for loaded certs
Commit Message: Emit metric for seconds since UNIX epoch of expiration for loaded certs Additional Description:
Introducing TLS/CA certificate metrics for seconds since epoch of expiration.
New metric with format:
"cluster.<cluster_name>.ssl.certificate.<cert_name>.expiration_unix_time_seconds" [GAUGE]
"listener.<address>.ssl.certificate.<cert_name>.expiration_unix_time_seconds" [GAUGE]
CA certificates & TLS certificates Dynamic certs (from SDS) -> cert name will be cert name from SDS Static certs (from file-path) -> cert name will be the secret name. Inline certs (from inline certs in configuration) -> name will be prefixed with "unnamed_certs_" + first 8 chars of the hash of the cert. Inline ca cert -> name will be prefixed with "unnamed_ca_cert_" + first 8 chars of the hash of the cert.
Risk Level: Low Testing: Unit/Integration tests Docs Changes: Included Release Notes: Included Platform Specific Features: [Optional Runtime guard:] N/A [Optional Fixes #Issue] #27396 [Optional Fixes commit #PR or SHA] [Optional Deprecated:] N/A [Optional API Considerations:]
/retest
@ggreenway Do you mind taking a look at my PR? 😄
/assign-from @envoyproxy/first-pass-reviewers
@envoyproxy/first-pass-reviewers assignee is @silverstar194
/retest
/retest
/retest
/retest
/retest
/retest
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @RyanTheOptimist
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to
(bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch). envoyproxy/dependency-shepherds assignee is @RyanTheOptimist🐱
sorry, I accidentally did a git merge upstream/main which pulled in several other changes. I reverted those.
/retest
/retest
Seems like getting an unrelated error from clang-tidy. I'm not sure how to fix it but was able to get green CI from retest though takes a few tries.
/retest
clang-tidy error in CI shows the following:
Error while processing /source/envoy/ssl/certificate_validation_context_config.h.
/build/bazel_root/base/execroot/envoy/./source/common/common/base_logger.h:7:10: error: 'spdlog/spdlog.h' file not found [clang-diagnostic-error]
#include "spdlog/spdlog.h"
.
.
.
I did add some code in certificate_validation_context_config.h but not sure how it relates with base_logger.h and not sure why 'spdlog/spdlog.h' file not found
Will ignore that failure for now.
clang-tidy error in CI shows the following:
Error while processing /source/envoy/ssl/certificate_validation_context_config.h. /build/bazel_root/base/execroot/envoy/./source/common/common/base_logger.h:7:10: error: 'spdlog/spdlog.h' file not found [clang-diagnostic-error] #include "spdlog/spdlog.h" . . .I did add some code in
certificate_validation_context_config.hbut not sure how it relates withbase_logger.hand not sure why'spdlog/spdlog.h' file not foundWill ignore that failure for now.
Yeah this is a known occasional failure mode in tidy. :/ Ignoring it is the right choice, alas.
/assign @ggreenway since this is TLS related
RE tidy not finding spdlog.h: I believe I've seen this before. The cause is BUILD files that don't fully list required dependencies. Normal compiles work because of transitive dependencies. But, for reasons I don't understand, clang-tidy is stricter. So a potential remedy is adding direct BUILD dependencies as indicated by the detailed tidy logs.
fyi, going on a short vacation and be back to work on this end of this month. @ggreenway
fyi, going on a short vacation and be back to work on this end of this month. @ggreenway
Sounds good; ping me when you get back and this is ready for review. And if stale-bot closes it I'll re-open.
/wait
Looks like there are merge conflicts, and DCO needs to be fixed. /wait
Addressed feedback. Ready for another round of review :grin:
/retest
/retest
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!
Commenting to bring out of stale. Will work on it next week. Had some other high priority work last few weeks so wasn't able to work on it.
Sorry for delay, @ggreenway I addressed your feedback. Let me know what you think. :)
One more high-level question: instead of seconds-until-expiration, what if we published it as the time in seconds-since-unix-epoch of the expiration. This has two advantages: it's not constantly changing and taking up more space in a TSDB, and envoy doesn't have to keep updating the value (it gets set once when we load the cert and read it's expiration time). I know that makes it different than the existing envoy metric for time until next cert expiration, but seems worth considering.
@ggreenway
Seems if we switch to seconds-since-unix-epoch of the expiration it would be similar to viewing the certs from the /certs endpoint?
I think our use case is we want to have alerts setup that points to specific certs to notify if the certs are expiring soon.
Do you think we can move forward with current approach and we can iterate on it in next PRs if we want to tweak it?