serving icon indicating copy to clipboard operation
serving copied to clipboard

[WIP] Stream logs from user pods during upgrades

Open mgencur opened this issue 2 years ago • 7 comments

This PR depends on https://github.com/knative/pkg/pull/2591 to be merged first

Upgrades tests make the user pods (which have queue-proxy container) restart due to updating the queue-proxy image. There can be temporary errors which disappear in the end. Without log streaming, the logs from user pods that were already shutdown are lost.

This PR pulls changes from knative.dev/pkg related to logstream which is now able to stream logs from user namespaces without filtering individual lines but taking into account only user-specified pods. This is because multiple tests are running withing the same namespace in parallel so we need to define which pods belong the the current test.

I'm sending the changes to knative.dev/pkg in parallel and will update the PR when/if they're merged. Currently the changes are hardcoded in the vendor directory.

Note: Streaming logs from the system namespace has been broken because the upgrade tests use multiple testing.T and streaming is ended after the "ContinualTests/Setup" phase is finished. We might fix it later when https://github.com/knative/pkg/issues/2421 is implemented.

Proposed Changes

  • Stream logs from user pods during upgrades
  • Print the logs on error in the test log.

Release Note


mgencur avatar Sep 12 '22 09:09 mgencur

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mgencur Once this PR has been reviewed and has the lgtm label, please assign dsimansk for approval by writing /assign @dsimansk in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

knative-prow[bot] avatar Sep 12 '22 09:09 knative-prow[bot]

Codecov Report

Base: 86.51% // Head: 86.48% // Decreases project coverage by -0.03% :warning:

Coverage data is based on head (439e2c6) compared to base (1c6a05d). Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13293      +/-   ##
==========================================
- Coverage   86.51%   86.48%   -0.04%     
==========================================
  Files         196      196              
  Lines       14526    14544      +18     
==========================================
+ Hits        12567    12578      +11     
- Misses       1661     1666       +5     
- Partials      298      300       +2     
Impacted Files Coverage Δ
pkg/reconciler/configuration/configuration.go 82.93% <0.00%> (-2.27%) :arrow_down:
pkg/reconciler/route/traffic/traffic.go 92.55% <0.00%> (+0.08%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Sep 12 '22 09:09 codecov[bot]

/hold It works only for short strings, if the name is too long the names are truncated and Pod name doesn't include the name of the Service (ObjectNameForTest). Need to get concrete pods for the given service.

mgencur avatar Sep 12 '22 10:09 mgencur

When the name of the service is too long the names are as follows:

ksvc
serving-upgrades-continual-tests-setup-au-hwqgcxvu
serving-upgrades-continual-tests-setup-au-vycgximb

serverless-service:
serverlessservice/serving-upgrades-continual-tests-setup-au-hwqgcxvu-00001

deployments:
serving-upgrades-con8f7e50d9277d222233ee19efcde85c06-deployment
(remove the -deployment suffix)

pods:
pod/serving-upgrades-con8f7e50d9277d222233ee19efcde85c06-deplo4fss6

We could get the Pod names by getting the serverless-service, getting the Deployment it references, and removing the "-deployment" suffix. Then we'd get a general prefix for Pods for the given ksvc.

Another, maybe much simple way would be to simply define a short name for the ksvc in the test, instead of using test.ObjectNameForTest(t), we could have autoscale-sus and autoscale-sus-tbc and the Pods would then include the ksvc name by default (nothing would be truncated).

mgencur avatar Sep 12 '22 10:09 mgencur

Trying the shorter names with the latest commit.

I think using shorter names is desirable because when the name is based on the whole test name (e.g. TestServingUpgrades/ContinualTests/SetupAutoscaleSustainingTest) all pods in the namespace have the same prefix and it's not clear which pod belongs to which test. This is bad for debugging purposes. Background: We have AutoscaleSustainingTest and AutoscaleSustainingWithTBCTest which run in parallel. All the resulting ksvc/deployment/pods will start with serving-upgrades-continual-tests-setup-au- followed by generated random characters. The will be many pods and not clear where they belong. This is because the names are too long and are truncated.

mgencur avatar Sep 12 '22 10:09 mgencur

/retest

mgencur avatar Sep 12 '22 12:09 mgencur

@mgencur: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
build-tests_serving_main 439e2c614cf0f8a22d91420d8c245aa6c13d4930 link true /test build-tests
upgrade-tests_serving_main 439e2c614cf0f8a22d91420d8c245aa6c13d4930 link true /test upgrade-tests

Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

knative-prow[bot] avatar Sep 19 '22 08:09 knative-prow[bot]

This Pull Request is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen with /reopen. Mark as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Dec 19 '22 01:12 github-actions[bot]

I will reopen this when the related knative/pkg PR is merged.

mgencur avatar Jan 03 '23 13:01 mgencur