origin icon indicating copy to clipboard operation
origin copied to clipboard

OCPBUGS-18865: Reapply "Merge pull request #28944 from vrutkovs/in-cluster-fixes-v4"

Open vrutkovs opened this issue 1 year ago • 66 comments

This reverts commit 12feee9.

Skip creating disruption junits when loadbalancer is "localhost". Also needs https://github.com/openshift/ci-tools/pull/4251 to make it pass in aggregated jobs

vrutkovs avatar Aug 06 '24 11:08 vrutkovs

@vrutkovs: This pull request references Jira Issue OCPBUGS-18865, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This reverts commit 12feee9.

Skip creating disruption junits when loadbalancer is "localhost"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Aug 06 '24 11:08 openshift-ci-robot

/test lint /payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 10

vrutkovs avatar Aug 06 '24 12:08 vrutkovs

@vrutkovs: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/acb68f00-53eb-11ef-97be-f3dacfd72da6-0

openshift-ci[bot] avatar Aug 06 '24 12:08 openshift-ci[bot]

/jira refresh

vrutkovs avatar Aug 06 '24 12:08 vrutkovs

@vrutkovs: This pull request references Jira Issue OCPBUGS-18865, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Aug 06 '24 12:08 openshift-ci-robot

https://github.com/openshift/ci-tools/pull/4251 would make jobaggregator skip localhost disruptions

vrutkovs avatar Aug 06 '24 18:08 vrutkovs

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade 10

neisw avatar Aug 06 '24 18:08 neisw

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/25cc9340-5422-11ef-9ba9-dca46880fd93-0

openshift-ci[bot] avatar Aug 06 '24 18:08 openshift-ci[bot]

Yeah there are failures in both jobs. Every run in 4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade is well over 10k. I wonder if the extra logging is stressing the disks. Probably need to try to review metrics on it.

: [sig-etcd] etcd should not log excessive took too long messages 
{  Etcd logged 17011 'took too long' messages, this test fails on any value over 10000 as this is a strong indicator that etcd was very unhealthy throughout the run. This can cause sparodic e2e failures and disruption and typically indicates faster disks are needed. These log message intervals are included in spyglass chart artifacts and can be used to correlate with disruption and failed tests.}

neisw avatar Aug 06 '24 23:08 neisw

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-aws-upgrade-ovn-single-node 10

neisw avatar Aug 08 '24 22:08 neisw

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e9f1c0d0-55d3-11ef-89b5-1733ecabd6d6-0

openshift-ci[bot] avatar Aug 08 '24 22:08 openshift-ci[bot]

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-aws-upgrade-ovn-single-node 10

neisw avatar Aug 09 '24 12:08 neisw

@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/346a8720-564a-11ef-83b9-dbe34dc844a2-0

openshift-ci[bot] avatar Aug 09 '24 12:08 openshift-ci[bot]

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-aws-upgrade-ovn-single-node 10 /payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 10

vrutkovs avatar Aug 09 '24 16:08 vrutkovs

@vrutkovs: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/82242890-5668-11ef-8cad-7469e8e59e94-0

openshift-ci[bot] avatar Aug 09 '24 16:08 openshift-ci[bot]

Job Failure Risk Analysis for sha: 57c740f839bf643c7b33f050839682f2233a23b1

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade High
[sig-apps] job-upgrade
This test has passed 100.00% of 193 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade'] in the last 14 days.
---
[sig-network] there should be nearly zero single second disruptions for kube-api-http2-internal-lb-reused-connections
This test has passed 100.00% of 19 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade High
[sig-node][invariant] alert/TargetDown should not be at or above info in ns/kube-system
This test has passed 99.98% of 4739 runs on release 4.18 [Overall] in the last week.

Open Bugs
Kubelet metrics endpoints experiencing prolonged outages
pull-ci-openshift-origin-master-e2e-aws-ovn-serial High
[sig-api-machinery][Feature:ResourceQuota] Object count should properly count the number of persistentvolumeclaims resources [Serial] [Suite:openshift/conformance/serial]
This test has passed 100.00% of 30 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.
---
[sig-storage] CSI Mock selinux on mount metrics SELinuxMount metrics [LinuxOnly] [Feature:SELinux] [Serial] warning is bumped on two Pods with a different context on RWO volume [FeatureGate:SELinuxMountReadWriteOncePod] [Beta] [Feature:SELinuxMountReadWriteOncePodOnly] [Suite:openshift/conformance/serial] [Suite:k8s]
This test has passed 100.00% of 30 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.
---
[sig-network][Feature:EgressIP][apigroup:operator.openshift.io] [external-targets][apigroup:user.openshift.io][apigroup:security.openshift.io] pods should have the assigned EgressIPs and EgressIPs can be updated [Skipped:Network/OpenShiftSDN] [Serial] [Suite:openshift/conformance/serial]
This test has passed 100.00% of 30 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.
---
[sig-storage] CSI Mock selinux on mount metrics SELinuxMount metrics [LinuxOnly] [Feature:SELinux] [Serial] error is bumped on two Pods with a different context on RWOP volume [FeatureGate:SELinuxMountReadWriteOncePod] [Beta] [Suite:openshift/conformance/serial] [Suite:k8s]
This test has passed 100.00% of 30 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.

openshift-trt-bot avatar Aug 13 '24 11:08 openshift-trt-bot

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-aws-upgrade-ovn-single-node 10 /payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 10

vrutkovs avatar Aug 13 '24 20:08 vrutkovs

@vrutkovs: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/73ea7ce0-59b3-11ef-974e-a5a11bcefefd-0

openshift-ci[bot] avatar Aug 13 '24 20:08 openshift-ci[bot]

I think SNO looks better, we don't see the duplicated disruption in the conformance run. The aggregated disruption fails but we are in the process of removing sno aggregated disruption (https://github.com/openshift/ci-tools/pull/4263)

Still see the etcd took too long failures on azure

neisw avatar Aug 14 '24 15:08 neisw

/payload-job periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

vrutkovs avatar Aug 15 '24 11:08 vrutkovs

@vrutkovs: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/93ab2f70-5afd-11ef-933b-a17270c26bed-0

openshift-ci[bot] avatar Aug 15 '24 11:08 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

vrutkovs avatar Aug 15 '24 17:08 vrutkovs

@vrutkovs: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/01a473c0-5b2f-11ef-84cd-741ed4d22cfa-0

openshift-ci[bot] avatar Aug 15 '24 17:08 openshift-ci[bot]

Job Failure Risk Analysis for sha: 1b25893e85db38b5c20d51551d1dddc9f6cd4301

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade Medium
[sig-network] pods should successfully create sandboxes by adding pod to network
This test has passed 90.91% of 121 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week.

Open Bugs
s390x: [sig-network] pods should successfully create sandboxes by adding pod to network fails with error adding pod to CNI network

openshift-trt-bot avatar Aug 15 '24 22:08 openshift-trt-bot

/payload-job periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

vrutkovs avatar Aug 16 '24 05:08 vrutkovs

@vrutkovs: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b5a87be0-5b94-11ef-9df3-b87302f39381-0

openshift-ci[bot] avatar Aug 16 '24 05:08 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

vrutkovs avatar Aug 16 '24 12:08 vrutkovs

@vrutkovs: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/22aaaea0-5bc9-11ef-9527-1c1b49f00857-0

openshift-ci[bot] avatar Aug 16 '24 12:08 openshift-ci[bot]

Job Failure Risk Analysis for sha: dcd8740d163bc4b7a249222cbdabcd2deb4682a5

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade High
[Jira:"kube-apiserver"] monitor test apiserver-availability setup
This test has passed 100.00% of 750 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn-upgrade' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-upgrade'] in the last 14 days.
---
[Jira:"kube-apiserver"] monitor test apiserver-availability test evaluation
This test has passed 100.00% of 750 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn-upgrade' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-upgrade'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-gcp-ovn High
[Jira:"kube-apiserver"] monitor test apiserver-availability test evaluation
This test has passed 100.00% of 23 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn'] in the last 14 days.
---
[Jira:"kube-apiserver"] monitor test apiserver-availability setup
This test has passed 100.00% of 23 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade High
[Jira:"kube-apiserver"] monitor test apiserver-availability setup
This test has passed 100.00% of 518 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade'] in the last 14 days.
---
[Jira:"kube-apiserver"] monitor test apiserver-availability test evaluation
This test has passed 100.00% of 518 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade High
[Jira:"kube-apiserver"] monitor test apiserver-availability test evaluation
This test has passed 99.93% of 4167 runs on release 4.18 [Overall] in the last week.
---
[Jira:"kube-apiserver"] monitor test apiserver-availability setup
This test has passed 100.00% of 131 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week.
---
[Jira:"kube-apiserver"] monitor test apiserver-availability test evaluation
This test has passed 99.93% of 4167 runs on release 4.18 [Overall] in the last week.
---
[Jira:"kube-apiserver"] monitor test apiserver-availability setup
This test has passed 100.00% of 130 runs on release 4.18 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:single Upgrade:micro] in the last week.
pull-ci-openshift-origin-master-e2e-aws-ovn-serial High
[Jira:"kube-apiserver"] monitor test apiserver-availability test evaluation
This test has passed 100.00% of 39 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.
---
[Jira:"kube-apiserver"] monitor test apiserver-availability setup
This test has passed 100.00% of 39 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.

openshift-trt-bot avatar Aug 16 '24 16:08 openshift-trt-bot

Job Failure Risk Analysis for sha: dcd8740d163bc4b7a249222cbdabcd2deb4682a5

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial High
[sig-arch] events should not repeat pathologically for ns/openshift-authentication-operator
This test has passed 100.00% of 39 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.
---
[bz-Monitoring] clusteroperator/monitoring should not change condition/Available
This test has passed 100.00% of 39 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial'] in the last 14 days.

Open Bugs
monitoring ClusterOperator should not blip Available=Unknown on client rate limiter

openshift-trt-bot avatar Aug 16 '24 16:08 openshift-trt-bot