origin icon indicating copy to clipboard operation
origin copied to clipboard

NO-JIRA: Improve unexpected reboot test output

Open dgoodwin opened this issue 8 months ago • 9 comments

The test output today is pretty confusing, times are not formatted as they were intended due to the use of slices. This change formats the timestamps to be human readable in output, improves a couple variable names, and logs the boots for each node in chronological order instead of reverse.

Old output:

{  fail [github.com/openshift/origin/test/extended/machines/cluster.go:176]: Unexpected error:
    <errors.aggregate | len:2, cap:2>: 
    [unexpected boot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}], expected reboot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]]
    [
        <*errors.errorString | 0xc00217e540>{
            s: "unexpected boot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]",
        },
        <*errors.errorString | 0xc00217e580>{
            s: "expected reboot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]",
        },
    ]
occurred
Ginkgo exit error 1: exit with code 1}

dgoodwin avatar Apr 09 '25 17:04 dgoodwin

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Apr 09 '25 17:04 openshift-ci[bot]

Risk analysis has seen new tests most likely introduced by this PR. Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: dab9f95e179684a56a3f2cdd1beaf94e2b08e67b

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-bgp-techpreview Medium - "[sig-network][OCPFeatureGate:RouteAdvertisements][Feature:RouteAdvertisements][apigroup:operator.openshift.io] when using openshift ovn-kubernetes [PodNetwork] Advertising the default network [apigroup:user.openshift.io][apigroup:security.openshift.io] External host should be able to query route advertised pods by the pod IP [Suite:openshift/conformance/parallel]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-bgp-techpreview Medium - "[sig-network][OCPFeatureGate:RouteAdvertisements][Feature:RouteAdvertisements][apigroup:operator.openshift.io] when using openshift ovn-kubernetes [PodNetwork] Advertising the default network [apigroup:user.openshift.io][apigroup:security.openshift.io] pods should communicate with external host without being SNATed [Suite:openshift/conformance/parallel]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: dab9f95e179684a56a3f2cdd1beaf94e2b08e67b

  • "[sig-network][OCPFeatureGate:RouteAdvertisements][Feature:RouteAdvertisements][apigroup:operator.openshift.io] when using openshift ovn-kubernetes [PodNetwork] Advertising the default network [apigroup:user.openshift.io][apigroup:security.openshift.io] External host should be able to query route advertised pods by the pod IP [Suite:openshift/conformance/parallel]" [Total: 1, Pass: 1, Fail: 0, Flake: 1]
  • "[sig-network][OCPFeatureGate:RouteAdvertisements][Feature:RouteAdvertisements][apigroup:operator.openshift.io] when using openshift ovn-kubernetes [PodNetwork] Advertising the default network [apigroup:user.openshift.io][apigroup:security.openshift.io] pods should communicate with external host without being SNATed [Suite:openshift/conformance/parallel]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

openshift-trt[bot] avatar Apr 09 '25 23:04 openshift-trt[bot]

/test e2e-metal-ipi-ovn e2e-metal-ipi-virtualmedia e2e-metal-ipi-ovn-dualstack-local-gateway

MaysaMacedo avatar Apr 10 '25 13:04 MaysaMacedo

@dgoodwin It looks good. Did you have a chance to test and see if the output is really what you expect? Can you add a link to some jira or use no-jira in the PR title?

MaysaMacedo avatar Apr 10 '25 13:04 MaysaMacedo

Unfortunately this test is too rare a failure to reproduce in the PR, so we'd have to push it into the wild and wait.

dgoodwin avatar Apr 10 '25 17:04 dgoodwin

Actually I can add a bogus failure and make it fail, I'll try that

dgoodwin avatar Apr 10 '25 17:04 dgoodwin

@dgoodwin: This pull request explicitly references no jira issue.

In response to this:

The test output today is pretty confusing, times are not formatted as they were intended due to the use of slices. This change formats the timestamps to be human readable in output, improves a couple variable names, and logs the boots for each node in chronological order instead of reverse.

Old output:

{  fail [github.com/openshift/origin/test/extended/machines/cluster.go:176]: Unexpected error:
   <errors.aggregate | len:2, cap:2>: 
   [unexpected boot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}], expected reboot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]]
   [
       <*errors.errorString | 0xc00217e540>{
           s: "unexpected boot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]",
       },
       <*errors.errorString | 0xc00217e580>{
           s: "expected reboot for node/ci-op-ihmplk3x-b6ce9-6jx4j-worker-eastus1-29cbp, got [{Boot {0 63879265172 <nil>}} {Boot {0 63879264967 <nil>}} {Boot {0 63879262740 <nil>}} {RebootRequest {0 63879262620 0xd471420}} {Boot {0 63879258488 <nil>}} {RebootRequest {0 63879258388 0xd471420}} {Boot {0 63879258175 <nil>}}]",
       },
   ]
occurred
Ginkgo exit error 1: exit with code 1}

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Apr 10 '25 17:04 openshift-ci-robot

/retest

dgoodwin avatar Aug 13 '25 15:08 dgoodwin

Job Failure Risk Analysis for sha: 5beff6cc7b5ea0c420d7a69706a18391af30473b

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 99.09% of 3199 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 99.09% of 3199 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 99.09% of 3199 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-kube-apiserver-rollout High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 99.09% of 3199 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-proxy High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 99.09% of 3199 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 99.09% of 3199 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 99.09% of 3199 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 99.09% of 3199 runs on release 4.21 [Overall] in the last week.

openshift-trt[bot] avatar Oct 03 '25 20:10 openshift-trt[bot]

@dgoodwin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-fips-serial 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-gcp-fips-serial
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
ci/prow/e2e-metal-ipi-serial-ovn-ipv6 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-serial-ovn-ipv6
ci/prow/e2e-aws 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws
ci/prow/okd-e2e-gcp 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test okd-e2e-gcp
ci/prow/e2e-metal-ipi-serial 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-serial
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-techpreview 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-ovn-dualstack-bgp-techpreview
ci/prow/e2e-aws-ovn-serial 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-serial-publicnet 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-aws-ovn-serial-publicnet
ci/prow/e2e-gcp-ovn-etcd-scaling 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-aws-ovn 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-ovn
ci/prow/e2e-agnostic-ovn-cmd 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-metal-ipi-virtualmedia 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-openstack-serial 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-openstack-serial
ci/prow/e2e-openstack-ovn 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-openstack-ovn
ci/prow/e2e-vsphere-ovn-upi 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-vsphere-ovn-upi
ci/prow/okd-scos-e2e-aws-ovn 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-hypershift-conformance 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-hypershift-conformance
ci/prow/e2e-metal-ipi-ovn 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-ovn
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-ovn-kube-apiserver-rollout
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-2of2 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-serial-ovn-ipv6-2of2
ci/prow/e2e-aws-ovn-fips 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-aws-ovn-fips
ci/prow/e2e-azure-ovn-upgrade 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-gcp-ovn-techpreview-serial-1of2 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-gcp-ovn-techpreview-serial-1of2
ci/prow/e2e-aws-ovn-single-node 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-ovn-single-node
ci/prow/e2e-gcp-disruptive 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-gcp-disruptive
ci/prow/e2e-metal-ipi-serial-2of2 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-serial-2of2
ci/prow/e2e-gcp-ovn 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn-cgroupsv2 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-vsphere-ovn-etcd-scaling 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-single-node-serial 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-azure-ovn-etcd-scaling 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-single-node-upgrade 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-1of2 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-serial-ovn-ipv6-1of2
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-aws-disruptive 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-disruptive
ci/prow/e2e-metal-ipi-serial-1of2 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-serial-1of2
ci/prow/e2e-aws-proxy 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-proxy
ci/prow/e2e-aws-ovn-serial-1of2 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-aws-ovn-serial-1of2
ci/prow/e2e-gcp-ovn-techpreview 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-gcp-ovn-techpreview
ci/prow/e2e-azure 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-azure
ci/prow/e2e-gcp-ovn-techpreview-serial-2of2 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-gcp-ovn-techpreview-serial-2of2
ci/prow/e2e-aws-ovn-etcd-scaling 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-ovn-edge-zones 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-aws-ovn-edge-zones
ci/prow/e2e-metal-ipi-ovn-dualstack 5beff6cc7b5ea0c420d7a69706a18391af30473b link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-aws-ovn-serial-2of2 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-aws-ovn-serial-2of2
ci/prow/e2e-vsphere-ovn 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-vsphere-ovn
ci/prow/e2e-gcp-csi 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-gcp-csi
ci/prow/e2e-aws-csi 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-aws-csi
ci/prow/go-verify-deps 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test go-verify-deps
ci/prow/e2e-metal-ipi-ovn-ipv6 5beff6cc7b5ea0c420d7a69706a18391af30473b link true /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Nov 18 '25 13:11 openshift-ci[bot]

/close

dgoodwin avatar Nov 18 '25 13:11 dgoodwin

@dgoodwin: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Nov 18 '25 13:11 openshift-ci[bot]