origin icon indicating copy to clipboard operation
origin copied to clipboard

HOSTEDCP-1303: hs, kubevirt: add hosted cluster node restart test

Open maiqueb opened this issue 1 year ago • 45 comments

Assure an hosted cluster node can be restarted, and the node eventually reach the Ready condition.

maiqueb avatar Nov 14 '23 15:11 maiqueb

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] avatar Nov 14 '23 15:11 openshift-ci[bot]

/cc @qinqon

maiqueb avatar Nov 14 '23 17:11 maiqueb

/lgtm /approve

qinqon avatar Nov 15 '23 11:11 qinqon

/assign @orenc1

qinqon avatar Nov 15 '23 11:11 qinqon

@maiqueb: This pull request references CNV-35204 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Assure an hosted cluster node can be restarted, and the node eventually reach the Ready condition.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot avatar Nov 15 '23 11:11 openshift-ci-robot

/jira refresh

maiqueb avatar Nov 15 '23 11:11 maiqueb

@maiqueb: This pull request references CNV-35204 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.15." or "openshift-4.15.", but it targets "CNV v4.15.0" instead.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot avatar Nov 15 '23 11:11 openshift-ci-robot

@maiqueb: This pull request references HOSTEDCP-1303 which is a valid jira issue.

In response to this:

Assure an hosted cluster node can be restarted, and the node eventually reach the Ready condition.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot avatar Nov 15 '23 11:11 openshift-ci-robot

/test ci/prow/images

maiqueb avatar Nov 15 '23 11:11 maiqueb

@maiqueb: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

  • /test e2e-aws-jenkins
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-image-registry
  • /test e2e-aws-ovn-serial
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-builds
  • /test e2e-gcp-ovn-image-ecosystem
  • /test e2e-gcp-ovn-upgrade
  • /test e2e-metal-ipi-ovn-ipv6
  • /test images
  • /test lint
  • /test unit
  • /test verify
  • /test verify-deps

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
  • /test e2e-agnostic-ovn-cmd
  • /test e2e-aws
  • /test e2e-aws-csi
  • /test e2e-aws-disruptive
  • /test e2e-aws-etcd-recovery
  • /test e2e-aws-multitenant
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-cgroupsv2
  • /test e2e-aws-ovn-etcd-scaling
  • /test e2e-aws-ovn-kubevirt
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-single-node-serial
  • /test e2e-aws-ovn-single-node-upgrade
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-proxy
  • /test e2e-azure
  • /test e2e-azure-ovn-etcd-scaling
  • /test e2e-baremetalds-kubevirt
  • /test e2e-gcp-csi
  • /test e2e-gcp-disruptive
  • /test e2e-gcp-fips-serial
  • /test e2e-gcp-ovn-etcd-scaling
  • /test e2e-gcp-ovn-rt-upgrade
  • /test e2e-gcp-ovn-techpreview
  • /test e2e-gcp-ovn-techpreview-serial
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-sdn
  • /test e2e-metal-ipi-serial
  • /test e2e-metal-ipi-serial-ovn-ipv6
  • /test e2e-metal-ipi-virtualmedia
  • /test e2e-openstack-ovn
  • /test e2e-openstack-serial
  • /test e2e-vsphere
  • /test e2e-vsphere-ovn-etcd-scaling
  • /test okd-e2e-gcp

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd
  • pull-ci-openshift-origin-master-e2e-aws-csi
  • pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2
  • pull-ci-openshift-origin-master-e2e-aws-ovn-fips
  • pull-ci-openshift-origin-master-e2e-aws-ovn-kubevirt
  • pull-ci-openshift-origin-master-e2e-aws-ovn-serial
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade
  • pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-origin-master-e2e-baremetalds-kubevirt
  • pull-ci-openshift-origin-master-e2e-gcp-csi
  • pull-ci-openshift-origin-master-e2e-gcp-ovn
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-builds
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgrade
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade
  • pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-origin-master-e2e-metal-ipi-sdn
  • pull-ci-openshift-origin-master-e2e-openstack-ovn
  • pull-ci-openshift-origin-master-images
  • pull-ci-openshift-origin-master-lint
  • pull-ci-openshift-origin-master-unit
  • pull-ci-openshift-origin-master-verify
  • pull-ci-openshift-origin-master-verify-deps

In response to this:

/test ci/prow/images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Nov 15 '23 11:11 openshift-ci[bot]

/test images

maiqueb avatar Nov 15 '23 11:11 maiqueb

/approve

nunnatsa avatar Nov 15 '23 12:11 nunnatsa

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: maiqueb, nunnatsa, qinqon Once this PR has been reviewed and has the lgtm label, please ask for approval from orenc1. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Nov 15 '23 12:11 openshift-ci[bot]

/test e2e-aws-ovn-kubevirt /test e2e-baremetalds-kubevirt

maiqueb avatar Nov 15 '23 14:11 maiqueb

Since this test is disruptive, it will need to occur in serial (no other tests executing in parallel). How is that achieved?

I guess I'll have to label it w/ the serial label. I find it strange the migration tests (from which I copied some of this) do not have that tag. @qinqon is this achieved w/ this [Early] tag in https://github.com/openshift/origin/blob/1325dd44a7f920872ce4bd9b7f1a7806433fe3db/test/extended/kubevirt/migration.go#L43 ?

Also, as we're looking at testing, we need to pay close attention to the baremetalds lane. I suspect we're going to encounter issues with monitor tests failing due to the disruption of the pods on the rebooted node.

How can I account for that ?

maiqueb avatar Nov 15 '23 15:11 maiqueb

New changes are detected. LGTM label has been removed.

openshift-ci[bot] avatar Nov 15 '23 16:11 openshift-ci[bot]

Since this test is disruptive, it will need to occur in serial (no other tests executing in parallel). How is that achieved?

I guess I'll have to label it w/ the serial label. I find it strange the migration tests (from which I copied some of this) do not have that tag. @qinqon is this achieved w/ this [Early] tag in

With this we ensure that we run live-migration before anything else.

https://github.com/openshift/origin/blob/1325dd44a7f920872ce4bd9b7f1a7806433fe3db/test/extended/kubevirt/migration.go#L43

?

Also, as we're looking at testing, we need to pay close attention to the baremetalds lane. I suspect we're going to encounter issues with monitor tests failing due to the disruption of the pods on the rebooted node.

How can I account for that ?

Not sure about serial

qinqon avatar Nov 15 '23 17:11 qinqon

Since this test is disruptive, it will need to occur in serial (no other tests executing in parallel). How is that achieved?

I guess I'll have to label it w/ the serial label. I find it strange the migration tests (from which I copied some of this) do not have that tag. @qinqon is this achieved w/ this [Early] tag in

With this we ensure that we run live-migration before anything else.

https://github.com/openshift/origin/blob/1325dd44a7f920872ce4bd9b7f1a7806433fe3db/test/extended/kubevirt/migration.go#L43

?

Also, as we're looking at testing, we need to pay close attention to the baremetalds lane. I suspect we're going to encounter issues with monitor tests failing due to the disruption of the pods on the rebooted node.

How can I account for that ?

Not sure about serial

I meant the monitor tests @davidvossel mentioned above.

What are those ? Are they run in parallel ? If I force this test to run sequentially will it still be impacted by those monitoring tests ?

maiqueb avatar Nov 15 '23 17:11 maiqueb

Since this test is disruptive, it will need to occur in serial (no other tests executing in parallel). How is that achieved?

I guess I'll have to label it w/ the serial label. I find it strange the migration tests (from which I copied some of this) do not have that tag. @qinqon is this achieved w/ this [Early] tag in

With this we ensure that we run live-migration before anything else.

https://github.com/openshift/origin/blob/1325dd44a7f920872ce4bd9b7f1a7806433fe3db/test/extended/kubevirt/migration.go#L43

?

Also, as we're looking at testing, we need to pay close attention to the baremetalds lane. I suspect we're going to encounter issues with monitor tests failing due to the disruption of the pods on the rebooted node.

How can I account for that ?

Not sure about serial

I meant the monitor tests @davidvossel mentioned above.

What are those ? Are they run in parallel ? If I force this test to run sequentially will it still be impacted by those monitoring tests ?

@qinqon ^

maiqueb avatar Nov 15 '23 17:11 maiqueb

/test e2e-aws-ovn-kubevirt

failed before starting the suite :cry:

maiqueb avatar Nov 15 '23 18:11 maiqueb

Since this test is disruptive, it will need to occur in serial (no other tests executing in parallel). How is that achieved?

I guess I'll have to label it w/ the serial label. I find it strange the migration tests (from which I copied some of this) do not have that tag. @qinqon is this achieved w/ this [Early] tag in

With this we ensure that we run live-migration before anything else.

https://github.com/openshift/origin/blob/1325dd44a7f920872ce4bd9b7f1a7806433fe3db/test/extended/kubevirt/migration.go#L43

?

Also, as we're looking at testing, we need to pay close attention to the baremetalds lane. I suspect we're going to encounter issues with monitor tests failing due to the disruption of the pods on the rebooted node.

How can I account for that ?

Not sure about serial

I meant the monitor tests @davidvossel mentioned above. What are those ? Are they run in parallel ? If I force this test to run sequentially will it still be impacted by those monitoring tests ?

@qinqon ^

They run in parallel, but I am not sure if serializing this test will prevent them to run since they are force to run in parallel.

@maiqueb are they failing at your local runs ?

qinqon avatar Nov 16 '23 08:11 qinqon

The test I've added is now passing in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/28398/pull-ci-openshift-origin-master-e2e-aws-ovn-kubevirt/1725089795899658240

@qinqon / @davidvossel

maiqueb avatar Nov 16 '23 13:11 maiqueb

/test e2e-baremetalds-kubevirt

qinqon avatar Nov 16 '23 14:11 qinqon

/test e2e-baremetalds-kubevirt

maiqueb avatar Nov 16 '23 17:11 maiqueb

Error from odf subscription

"conditions": [
                    {
                        "lastTransitionTime": "2023-11-16T18:33:14Z",
                        "message": "all available catalogsources are healthy",
                        "reason": "AllCatalogSourcesHealthy",
                        "status": "False",
                        "type": "CatalogSourcesUnhealthy"
                    },
                    {
                        "message": "constraints not satisfiable: no operators found in package odf-operator in the catalog referenced by subscription odf-operator, subscription odf-operator exists",
                        "reason": "ConstraintsNotSatisfiable",
                        "status": "True",
                        "type": "ResolutionFailed"
                    }
                ],

this is the subscription

"spec": {
                "channel": "stable-4.13",
                "installPlanApproval": "Automatic",
                "name": "odf-operator",
                "source": "redhat-operators",
                "sourceNamespace": "openshift-marketplace"
            },

qinqon avatar Nov 17 '23 08:11 qinqon

@maiqueb I am checking if stable-4.14 odf channel whould fix the issue https://github.com/openshift/release/pull/45796

qinqon avatar Nov 17 '23 09:11 qinqon

/test e2e-baremetalds-kubevirt

maiqueb avatar Nov 18 '23 18:11 maiqueb

/test e2e-aws-ovn-kubevirt

maiqueb avatar Nov 20 '23 09:11 maiqueb

/test e2e-baremetalds-kubevirt

maiqueb avatar Nov 20 '23 09:11 maiqueb

/test e2e-aws-ovn-kubevirt /test e2e-baremetalds-kubevirt

maiqueb avatar Nov 20 '23 11:11 maiqueb