origin icon indicating copy to clipboard operation
origin copied to clipboard

OCPEDGE-1788: TNF add etcd cold boot recovery tests from graceful node shutdown

Open clobrano opened this issue 1 month ago • 22 comments

Add three new test cases to validate etcd cluster recovery from cold boot scenarios reached through different graceful/ungraceful shutdown combinations:

  • Cold boot from double GNS: both nodes gracefully shut down simultaneously, then both restart (full cluster cold boot)
  • Cold boot from sequential GNS: first node gracefully shut down, then second node gracefully shut down, then both restart
  • Cold boot from mixed GNS/UGNS: first node gracefully shut down, surviving node then ungracefully shut down, then both restart

Note: The inverse case (UGNS first node, then GNS second) is not tested because in TNF clusters, an ungracefully shut down node is quickly recovered, preventing the ability to wait and gracefully shut down the second node later. The double UGNS scenario is already covered by existing tests.

clobrano avatar Nov 24 '25 16:11 clobrano

Pipeline controller notification This repository is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

openshift-ci-robot avatar Nov 24 '25 16:11 openshift-ci-robot

@clobrano: This pull request references OCPEDGE-1788 which is a valid jira issue.

In response to this:

Add three new test cases to validate etcd cluster recovery from cold boot scenarios reached through different graceful/ungraceful shutdown combinations:

  • Cold boot from double GNS: both nodes gracefully shut down simultaneously, then both restart (full cluster cold boot)
  • Cold boot from sequential GNS: first node gracefully shut down, then second node gracefully shut down, then both restart
  • Cold boot from mixed GNS/UGNS: first node gracefully shut down, surviving node then ungracefully shut down, then both restart

Note: The inverse case (UGNS first node, then GNS second) is not tested because in TNF clusters, an ungracefully shut down node is quickly recovered, preventing the ability to wait and gracefully shut down the second node later. The double UGNS scenario is already covered by existing tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Nov 24 '25 16:11 openshift-ci-robot

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

clobrano avatar Nov 24 '25 16:11 clobrano

@clobrano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/174335e0-c951-11f0-8750-599f1dc187cb-0

openshift-ci[bot] avatar Nov 24 '25 16:11 openshift-ci[bot]

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Nov 24 '25 20:11 openshift-ci-robot

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Nov 25 '25 12:11 openshift-ci-robot

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Nov 25 '25 20:11 openshift-ci-robot

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-metal-ipi-ovn-ipv6 /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Nov 28 '25 16:11 openshift-ci-robot

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-metal-ipi-ovn-ipv6 /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Dec 01 '25 15:12 openshift-ci-robot

/retest-required

clobrano avatar Dec 01 '25 19:12 clobrano

Last payload job looks like it hit an issue where etcd didn't recover, so we may want to see if that's resolved in the latest runs.

jaypoulz avatar Dec 04 '25 20:12 jaypoulz

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

jaypoulz avatar Dec 04 '25 21:12 jaypoulz

@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/725f6d80-d154-11f0-9458-8ef479ed1ca7-0

openshift-ci[bot] avatar Dec 04 '25 21:12 openshift-ci[bot]

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano, fonta-rh, jaypoulz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Dec 04 '25 21:12 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

clobrano avatar Dec 05 '25 07:12 clobrano

@clobrano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/bb33b1c0-d1ad-11f0-9c78-aeea81f4f3be-0

openshift-ci[bot] avatar Dec 05 '25 07:12 openshift-ci[bot]

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-metal-ipi-ovn-ipv6 /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Dec 05 '25 13:12 openshift-ci-robot

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

clobrano avatar Dec 05 '25 13:12 clobrano

@clobrano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f34b4040-d1e0-11f0-8fdb-30c402463c9b-0

openshift-ci[bot] avatar Dec 05 '25 13:12 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

clobrano avatar Dec 09 '25 08:12 clobrano

@clobrano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6cd7a600-d4d5-11f0-8da3-7302111ebe8f-0

openshift-ci[bot] avatar Dec 09 '25 08:12 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

clobrano avatar Dec 16 '25 07:12 clobrano

@clobrano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b57ea7d0-da54-11f0-8a72-6d8cb8097883-0

openshift-ci[bot] avatar Dec 16 '25 07:12 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

clobrano avatar Dec 16 '25 09:12 clobrano

@clobrano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/defdab00-da5f-11f0-8d54-69ff3a391228-0

openshift-ci[bot] avatar Dec 16 '25 09:12 openshift-ci[bot]

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-metal-ipi-ovn-ipv6 /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Dec 16 '25 09:12 openshift-ci-robot

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

clobrano avatar Dec 16 '25 13:12 clobrano

@clobrano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5ca1c0b0-da87-11f0-8408-ff01e9b9c126-0

openshift-ci[bot] avatar Dec 16 '25 13:12 openshift-ci[bot]

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-metal-ipi-ovn-ipv6 /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Dec 16 '25 14:12 openshift-ci-robot

/retest-required

clobrano avatar Dec 17 '25 09:12 clobrano