serving
serving copied to clipboard
Test for internal TLS certificate rotation
Add test for TLS certificate rotation
Proposed Changes
- Add test for TLS certificate rotation (including rotation of certificates for Activator, queue-proxy, Ingress)
- Re-implement the local function
scanPodLogsas a more versatileWaitForLogthat can wait for a given condition. - Fix a bug related to
ServingFlags.SkipCleanupOnFail. The cleanup should really be skipped only if the test fails. This was missed during reviews on the original PR - Function
getCertManagerCAnow also waits for the Secret to exist rather than failing immediately
Release Note
/hold
/unhold
The test fails with:
system_internal_tls_test.go:234: Deleting trust-bundle ConfigMap
stream.go:305: D 11:08:49.430 autoscaler-7d8fbf5b9-m86j7 [metrics/stats_scraper.go:252] [serving-tests/tls-certificate-rotation-ebvlcsmm-00001] |OldPods| = 1, |YoungPods| = 0
spoof.go:110: Spoofing tls-certificate-rotation-ebvlcsmm.serving-tests.example.com -> 34.138.175.134
system_internal_tls_test.go:254: The endpoint http://tls-certificate-rotation-ebvlcsmm.serving-tests.example.com/ didn't serve the expected text "Hello World! How about some tasty noodles?": response: status: 503, body: upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end, headers: map[Content-Length:[230] Content-Type:[text/plain] Date:[Fri, 17 May 2024 11:08:49 GMT] Server:[istio-envoy] Zipkin_trace_id:[d99519b5453a913fa98600ab36eea963]] did not pass checks: status = 503 503 Service Unavailable, want one of: [200]
@ReToCode I suppose the certificates might not be properly reloaded yet after deleting the routing-serving-certs secret and before we remove the "trust-bundle" configmap?
The test passed when I added waiting for ca.crt in routing-serving-certs. Will try again during reviews.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 84.74%. Comparing base (
b5455c9) to head (4e5bd95). Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #15217 +/- ##
==========================================
- Coverage 84.78% 84.74% -0.04%
==========================================
Files 218 218
Lines 13479 13479
==========================================
- Hits 11428 11423 -5
- Misses 1685 1687 +2
- Partials 366 369 +3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@ReToCode this is ready for another review.
/override "style / suggester / shell"
@ReToCode: ReToCode unauthorized: /override is restricted to Repo administrators.
In response to this:
/override "style / suggester / shell"
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
@dprotaso mind overriding the shell-check? IMHO, this should be in a separate PR, as only the time changed in this one.
@ReToCode @dprotaso OK. I went ahead and fixed the shell check.
/lgtm /approve
@ReToCode Hmm. The latest error looks genuine:
system_internal_tls_test.go:225: Deleting secret in system namespace
spoof.go:110: Spoofing tls-certificate-rotation-jryhlssh.serving-tests.example.com -> 35.231.189.127
stream.go:305: D 08:40:20.489 autoscaler-56bfd47c76-4m7mn [statforwarder/processor.go:119] [serving-tests/tls-certificate-rotation-jryhlssh-00001] Forward stat of bucket autoscaler-bucket-07-of-10 to the holder autoscaler-56bfd47c76-dx6fr_10.120.4.6
stream.go:305: D 08:40:20.489 autoscaler-56bfd47c76-dx6fr [statforwarder/processor.go:64] [serving-tests/tls-certificate-rotation-jryhlssh-00001] Accept stat as owner of bucket autoscaler-bucket-07-of-10
system_internal_tls_test.go:231: Deleting secret in ingress namespace
spoof.go:110: Spoofing tls-certificate-rotation-jryhlssh.serving-tests.example.com -> 35.231.189.127
stream.go:305: D 08:40:20.605 autoscaler-56bfd47c76-dx6fr [statforwarder/processor.go:64] [serving-tests/tls-certificate-rotation-jryhlssh-00001] Accept stat as owner of bucket autoscaler-bucket-07-of-10
system_internal_tls_test.go:237: Deleting old certificate from trust-bundle ConfigMap
stream.go:305: D 08:40:20.707 autoscaler-56bfd47c76-dx6fr [metrics/stats_scraper.go:252] [serving-tests/tls-certificate-rotation-jryhlssh-00001] |OldPods| = 1, |YoungPods| = 0
spoof.go:110: Spoofing tls-certificate-rotation-jryhlssh.serving-tests.example.com -> 35.231.189.127
system_internal_tls_test.go:266: The endpoint http://tls-certificate-rotation-jryhlssh.serving-tests.example.com didn't serve the expected text "Hello World! How about some tasty noodles?": response: status: 503, body: upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end, headers: map[Content-Length:[230] Content-Type:[text/plain] Date:[Tue, 28 May 2024 08:40:20 GMT] Server:[istio-envoy] Zipkin_trace_id:[95ccff430d23ad30786fccb1f81036fa]] did not pass checks: status = 503 503 Service Unavailable, want one of: [200]
So, after deleting the old cert from the trust-bundle there's an error with TLS. Any idea why this could happen?
@ReToCode As you mentioned here, we should not be deleting the old cert from the trust-bundle ConfigMap because the test is quick and the certificates might not be rotated yet. So, let's keep the ConfigMap?
@ReToCode As you mentioned https://github.com/knative/serving/pull/15217#discussion_r1607753785, we should not be deleting the old cert from the trust-bundle ConfigMap because the test is quick and the certificates might not be rotated yet. So, let's keep the ConfigMap?
+1, yes that could take a while to fully complete. Makes not much sense to wait for that in an e2e test.
/lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: mgencur, ReToCode
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [ReToCode]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
IMHO, we can skip certmanager-integration-tests_serving_main, as per https://github.com/knative/serving/pull/15261
/override "certmanager-integration-tests_serving_main"
@dprotaso: Overrode contexts on behalf of dprotaso: certmanager-integration-tests_serving_main
In response to this:
/override "certmanager-integration-tests_serving_main"
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
@mgencur: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| certmanager-integration-tests_serving_main | 4e5bd956f960b51d366571ddb4022e814e63c297 | link | unknown | /test certmanager-integration-tests |
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.