origin
origin copied to clipboard
OCPBUGS-38388: Fail on FailedToLease events for kubelet log collector
/jira refresh
@kannon92: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.
In response to this:
/jira refresh
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
@kannon92: This pull request references Jira Issue OCPBUGS-38388, which is invalid:
- expected the bug to target either version "4.18." or "openshift-4.18.", but it targets "4.17.z" instead
Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.
The bug has been updated to refer to the pull request using the external bug tracker.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
/jira refresh
@kannon92: This pull request references Jira Issue OCPBUGS-38388, which is invalid:
- expected the bug to target either version "4.18." or "openshift-4.18.", but it targets "4.17.z" instead
Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.
In response to this:
/jira refresh
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
/jira refresh
@kannon92: This pull request references Jira Issue OCPBUGS-38388, which is invalid:
- expected the bug to target only the "4.18.0" version, but multiple target versions were set
Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.
In response to this:
/jira refresh
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
/jira refresh
@kannon92: This pull request references Jira Issue OCPBUGS-38388, which is valid. The bug has been moved to the POST state.
3 validation(s) were run on this bug
- bug is open, matching expected state (open)
- bug target version (4.18.0) matches configured target version for branch (4.18.0)
- bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.
In response to this:
/jira refresh
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
Job Failure Risk Analysis for sha: ea11c108b00d599951d0f7376d1937d069bbd0e8
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial | IncompleteTests Tests for this run (21) are below the historical average (462): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems) |
| pull-ci-openshift-origin-master-e2e-gcp-csi | Medium [sig-network] can collect pod-to-host poller pod logs This test has passed 94.74% of 19 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-csi'] in the last 14 days. Open Bugs collecting poller pod logs failing in e2e-vsphere-ovn jobs --- [sig-network] can collect host-to-host poller pod logs This test has passed 94.74% of 19 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-csi'] in the last 14 days. |
/retest
Job Failure Risk Analysis for sha: 4bb45c498addf50a9ad905ab07b9029758de3aa7
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-edge-zones | IncompleteTests Tests for this run (101) are below the historical average (1558): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems) |
/retest
Job Failure Risk Analysis for sha: e2ef4d0e67e859525efdf432285caa2c6ef76c80
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial | High [bz-Management Console] clusteroperator/console should not change condition/Available This test has passed 100.00% of 36 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days. Open Bugs [bz-Management Console] clusteroperator/console should not change condition/Available |
/retest
Job Failure Risk Analysis for sha: 304b1ebf85f7f6017de20c184650dc87cb3657e5
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade | High [sig-arch] events should not repeat pathologically for ns/openshift-kube-apiserver-operator This test has passed 99.17% of 121 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week. |
Job Failure Risk Analysis for sha: 4bfbeeaa493cb1c8de763b7448434e1d7ccca321
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade | High [sig-arch] events should not repeat pathologically for ns/openshift-machine-api This test has passed 99.86% of 720 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade'] in the last 14 days. |
| pull-ci-openshift-origin-master-e2e-aws-ovn-serial | High [sig-network-edge][Feature:Idling] Unidling with Deployments [apigroup:route.openshift.io] should handle many TCP connections by possibly dropping those over a certain bound [Serial] [Suite:openshift/conformance/serial] This test has passed 100.00% of 1 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days. |
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade | Medium [sig-network-edge] Verify DNS availability during and after upgrade success This test has passed 94.77% of 172 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week. |
/retest-required
Looks like the test isn't de-duping as expected. Have a look at https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/28999/pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial/1825557058091487232 . Duplicate instances of the same lease failure? Logic error in display?
very neat
Aug 19 17:30:23.156577 ip-10-0-77-156 kubenswrapper[2478]: E0819 17:30:23.156522 2478 controller.go:195] "Failed to update lease" err="Put "https://api-int.ci-op-ht2pcfvh-a6aef.aws-2.ci.openshift.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-77-156.us-west-2.compute.internal?timeout=10s": dial tcp 10.0.54.68:6443: connect: connection refused" Aug 19 17:30:23.159659 ip-10-0-77-156 kubenswrapper[2478]: E0819 17:30:23.159621 2478 controller.go:195] "Failed to update lease" err="Put "https://api-int.ci-op-ht2pcfvh-a6aef.aws-2.ci.openshift.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-77-156.us-west-2.compute.internal?timeout=10s": dial tcp 10.0.87.235:6443: connect: connection refused" Aug 19 17:30:23.163067 ip-10-0-77-156 kubenswrapper[2478]: E0819 17:30:23.163032 2478 controller.go:195] "Failed to update lease" err="Put "https://api-int.ci-op-ht2pcfvh-a6aef.aws-2.ci.openshift.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-77-156.us-west-2.compute.internal?timeout=10s": dial tcp 10.0.54.68:6443: connect: connection refused" Aug 19 17:30:23.167030 ip-10-0-77-156 kubenswrapper[2478]: E0819 17:30:23.166982 2478 controller.go:195] "Failed to update lease" err="Put "https://api-int.ci-op-ht2pcfvh-a6aef.aws-2.ci.openshift.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-77-156.us-west-2.compute.internal?timeout=10s": dial tcp 10.0.87.235:6443: connect: connection refused" Aug 19 17:30:23.170620 ip-10-0-77-156 kubenswrapper[2478]: E0819 17:30:23.170584 2478 controller.go:195] "Failed to update lease" err="Put "https://api-int.ci-op-ht2pcfvh-a6aef.aws-2.ci.openshift.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-10-0-77-156.us-west-2.compute.internal?timeout=10s": dial tcp 10.0.54.68:6443: connect: connection refused" Aug 19 17:30:23.170756 ip-10-0-77-156 kubenswrapper[2478]: I0819 17:30:23.170623 2478 controller.go:115] "failed to update lease using latest lease, fallback to ensure lease" err="failed 5 attempts to update lease"
from the log. Backoff logic is interesting, five rapids, then a sleep. Perhaps we're looking for Failed to ensure lease exists instead?
/hold
Going to look at the logs tomorrow to see if this works.
/retest
Job Failure Risk Analysis for sha: e805d53c4dd605c0c47e7297394c8b43b38907ec
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-kube-apiserver-rollout | IncompleteTests Tests for this run (20) are below the historical average (717): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems) |
| pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6 | IncompleteTests Tests for this run (20) are below the historical average (1830): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems) |
| pull-ci-openshift-origin-master-e2e-metal-ipi-ovn | IncompleteTests Tests for this run (20) are below the historical average (2020): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems) |
Job Failure Risk Analysis for sha: 1558beaef3bd8ce46cccd05b0ee32490736cd9dc
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade | Medium [sig-network] pods should successfully create sandboxes by adding pod to network This test has passed 80.85% of 141 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week. Open Bugs s390x: [sig-network] pods should successfully create sandboxes by adding pod to network fails with error adding pod to CNI network |
/retest
Job Failure Risk Analysis for sha: 4a7181edf225cfd1be85f5322eaaa7f03f87380b
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade | Medium [sig-arch] events should not repeat pathologically for ns/openshift-kube-apiserver-operator This test has passed 93.33% of 120 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week. |
/hold cancel
/retest
/lgtm /approve /hold
you may release the hold when you're confident this is ready.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: deads2k, kannon92
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [deads2k]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment