origin
origin copied to clipboard
coreos/kdump: Add kdump e2e test using mco
Add e2e test for OCP CI that validates enabling kdump and generating kernel core via machine config successfully. This is one of the steps to to enhance the kdump feature.
@gursewak1997 ci/prow/verify is failing like:
FAILURE after 27.595s: hack/verify-generated.sh:13: executing '/go/src/github.com/openshift/origin/hack/update-generated.sh' expecting success: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
failed: all tests must define a [sig-XXXX] tag or have a rule "[Top Level] kdump TestKdump"
exit status 1
failed: all tests must define a [sig-XXXX] tag or have a rule "[Top Level] kdump TestKdump" exit status 1
Yup I am going over the doc to add the relevant tags before I re-commit.
/retest
the e2e-aws-single-node* tests continue to hit the Failed while waiting on imagestream import problem that was affecting the broader CI fleet...but since they are not required, we can merge over red here.
Another question for due to unfamiliarity...when/where does this new test run? Will it be part of CI jobs?
I'm struggling to find evidence that this test was run in any of the CI jobs that ran against the PR. (Furthermore, I can only see the other [sig-coreos] test running as part of e2e-gcp)
/test help
@miabbott: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:
/test e2e-aws-fips/test e2e-aws-image-registry/test e2e-aws-jenkins/test e2e-aws-serial/test e2e-gcp/test e2e-gcp-builds/test e2e-gcp-image-ecosystem/test e2e-gcp-upgrade/test extended_gssapi/test extended_ldap_groups/test extended_networking/test images/test lint/test verify/test verify-deps
The following commands are available to trigger optional jobs:
/test e2e-agnostic-cmd/test e2e-aws/test e2e-aws-cgroupsv2/test e2e-aws-csi/test e2e-aws-csi-migration/test e2e-aws-disruptive/test e2e-aws-multitenant/test e2e-aws-ovn/test e2e-aws-proxy/test e2e-aws-single-node/test e2e-aws-single-node-serial/test e2e-aws-single-node-upgrade/test e2e-aws-upgrade/test e2e-azure/test e2e-gcp-csi/test e2e-gcp-disruptive/test e2e-gcp-fips-serial/test e2e-gcp-ovn-rt-upgrade/test e2e-metal-ipi/test e2e-metal-ipi-ovn-dualstack/test e2e-metal-ipi-ovn-ipv6/test e2e-metal-ipi-serial/test e2e-metal-ipi-serial-ovn-ipv6/test e2e-metal-ipi-virtualmedia/test e2e-openstack/test e2e-openstack-serial/test e2e-vsphere/test okd-e2e-gcp
Use /test all to run the following jobs that were automatically triggered:
pull-ci-openshift-origin-master-e2e-agnostic-cmdpull-ci-openshift-origin-master-e2e-aws-cgroupsv2pull-ci-openshift-origin-master-e2e-aws-csipull-ci-openshift-origin-master-e2e-aws-fipspull-ci-openshift-origin-master-e2e-aws-serialpull-ci-openshift-origin-master-e2e-aws-single-nodepull-ci-openshift-origin-master-e2e-aws-single-node-upgradepull-ci-openshift-origin-master-e2e-gcppull-ci-openshift-origin-master-e2e-gcp-buildspull-ci-openshift-origin-master-e2e-gcp-csipull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgradepull-ci-openshift-origin-master-e2e-gcp-upgradepull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6pull-ci-openshift-origin-master-imagespull-ci-openshift-origin-master-lintpull-ci-openshift-origin-master-verifypull-ci-openshift-origin-master-verify-deps
In response to this:
/test help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Another question for due to unfamiliarity...when/where does this new test run? Will it be part of CI jobs?
I'm struggling to find evidence that this test was run in any of the CI jobs that ran against the PR. (Furthermore, I can only see the other
[sig-coreos]test running as part ofe2e-gcp)
Working on how/when tests run myself. Ideally, the kdump test should definitely run for this PR. I did see the test running in one of the initial tests where it failed. After I reran, the kdump test didn't run and the overall CI check passed.
/assign travier
See e.g. https://github.com/openshift/machine-config-operator/commit/825be33519852121fc1cc94695d1a759fb7e218b which we need to copy into privileged pods now as part of a recent security policy change
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: gursewak1997
Once this PR has been reviewed and has the lgtm label, please assign adambkaplan for approval by writing /assign @adambkaplan in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/retest
/test
@stbenjam: The /test command needs one or more targets.
The following commands are available to trigger required jobs:
/test e2e-aws-fips/test e2e-aws-image-registry/test e2e-aws-jenkins/test e2e-aws-serial/test e2e-gcp/test e2e-gcp-builds/test e2e-gcp-image-ecosystem/test e2e-gcp-upgrade/test extended_gssapi/test extended_ldap_groups/test extended_networking/test images/test lint/test verify/test verify-deps
The following commands are available to trigger optional jobs:
/test e2e-agnostic-cmd/test e2e-aws/test e2e-aws-cgroupsv2/test e2e-aws-csi/test e2e-aws-csi-migration/test e2e-aws-disruptive/test e2e-aws-multitenant/test e2e-aws-ovn/test e2e-aws-proxy/test e2e-aws-single-node/test e2e-aws-single-node-serial/test e2e-aws-single-node-upgrade/test e2e-aws-upgrade/test e2e-azure/test e2e-gcp-csi/test e2e-gcp-disruptive/test e2e-gcp-fips-serial/test e2e-gcp-ovn-rt-upgrade/test e2e-metal-ipi/test e2e-metal-ipi-ovn-dualstack/test e2e-metal-ipi-ovn-ipv6/test e2e-metal-ipi-serial/test e2e-metal-ipi-serial-ovn-ipv6/test e2e-metal-ipi-virtualmedia/test e2e-openstack/test e2e-openstack-serial/test e2e-vsphere/test okd-e2e-gcp
Use /test all to run the following jobs that were automatically triggered:
pull-ci-openshift-origin-master-e2e-agnostic-cmdpull-ci-openshift-origin-master-e2e-aws-cgroupsv2pull-ci-openshift-origin-master-e2e-aws-csipull-ci-openshift-origin-master-e2e-aws-fipspull-ci-openshift-origin-master-e2e-aws-serialpull-ci-openshift-origin-master-e2e-aws-single-nodepull-ci-openshift-origin-master-e2e-aws-single-node-upgradepull-ci-openshift-origin-master-e2e-gcppull-ci-openshift-origin-master-e2e-gcp-buildspull-ci-openshift-origin-master-e2e-gcp-csipull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgradepull-ci-openshift-origin-master-e2e-gcp-upgradepull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6pull-ci-openshift-origin-master-imagespull-ci-openshift-origin-master-lintpull-ci-openshift-origin-master-verifypull-ci-openshift-origin-master-verify-deps
In response to this:
/test
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
We've been bitten hard by some unreliable serial tests lately, I'd feel better if I saw it pass a couple times on some different configurations.
/test e2e-aws-serial /test e2e-gcp-fips-serial /test e2e-gcp-fips-serial /test e2e-metal-ipi-serial /test e2e-metal-ipi-serial-ovn-ipv6 /test e2e-openstack-serial
Note that this test actively makes a node crash, so I'm not sure how we should account for that in general.
Note that this test actively makes a node crash, so I'm not sure how we should account for that in general.
We have some synthetic tests that hunts for segfaults and go panics, but I don't think anything is looking for kernel panics. Does the node recover? I'm wondering if it might not because machine-config goes degraded....
See this run: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27291/pull-ci-openshift-origin-master-e2e-aws-serial/1550551517881176064
{Operator degraded (RequiredPoolsFailed): Failed to resync 4.12.0-0.ci.test-2022-07-22-185749-ci-op-59ryclwt-latest because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 2, updated: 2, unavailable: 1)] Operator degraded (RequiredPoolsFailed): Failed to resync 4.12.0-0.ci.test-2022-07-22-185749-ci-op-59ryclwt-latest because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 2, updated: 2, unavailable: 1)]}
As it's a fake crash, the node should just reboot after dumping the crash dump. The unavailability should be temporary.
{Operator degraded (RequiredPoolsFailed): Failed to resync 4.12.0-0.ci.test-2022-07-22-185749-ci-op-59ryclwt-latest because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 2, updated: 2, unavailable: 1)] Operator degraded (RequiredPoolsFailed): Failed to resync 4.12.0-0.ci.test-2022-07-22-185749-ci-op-59ryclwt-latest because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 2, updated: 2, unavailable: 1)]}
But indeed this would be the symptoms if the node would not reboot for any reason.
/test e2e-aws-disruptive
I updated the tags for this test to [Slow](Since the test typically took more than 5 minutes to finish), and [Disruptive](Since it includes rebooting a node). Also, since any [Disruptive] test is also assumed to qualify for the [Serial] label, but need not be labelled as both as per this doc, I dropped the [Serial] label.
I am not too sure in which job the kdump test should run now because I don't see it in ci/prow/e2e-aws-disruptiveorci/prow/e2e-aws-serial`
On the other hand, I have also updated the test not to have any degraded nodes after the test finishes.
@gursewak1997: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/e2e-gcp-fips-serial | f1fd75e7c018f7e5452a9eeb20cec0d2fcc3937e | link | false | /test e2e-gcp-fips-serial |
| ci/prow/e2e-metal-ipi-serial-ovn-ipv6 | f1fd75e7c018f7e5452a9eeb20cec0d2fcc3937e | link | false | /test e2e-metal-ipi-serial-ovn-ipv6 |
| ci/prow/e2e-openstack-serial | f1fd75e7c018f7e5452a9eeb20cec0d2fcc3937e | link | false | /test e2e-openstack-serial |
| ci/prow/e2e-metal-ipi-serial | f1fd75e7c018f7e5452a9eeb20cec0d2fcc3937e | link | false | /test e2e-metal-ipi-serial |
| ci/prow/e2e-aws-single-node-upgrade | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-aws-single-node-upgrade |
| ci/prow/e2e-metal-ipi-ovn-ipv6 | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-metal-ipi-ovn-ipv6 |
| ci/prow/e2e-gcp-ovn-rt-upgrade | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-gcp-ovn-rt-upgrade |
| ci/prow/e2e-aws-serial | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | true | /test e2e-aws-serial |
| ci/prow/e2e-aws-single-node | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-aws-single-node |
| ci/prow/e2e-aws-disruptive | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-aws-disruptive |
| ci/prow/e2e-gcp-ovn | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | true | /test e2e-gcp-ovn |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
@gursewak1997: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@gursewak1997: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/e2e-gcp-fips-serial | f1fd75e7c018f7e5452a9eeb20cec0d2fcc3937e | link | false | /test e2e-gcp-fips-serial |
| ci/prow/e2e-metal-ipi-serial-ovn-ipv6 | f1fd75e7c018f7e5452a9eeb20cec0d2fcc3937e | link | false | /test e2e-metal-ipi-serial-ovn-ipv6 |
| ci/prow/e2e-openstack-serial | f1fd75e7c018f7e5452a9eeb20cec0d2fcc3937e | link | false | /test e2e-openstack-serial |
| ci/prow/e2e-metal-ipi-serial | f1fd75e7c018f7e5452a9eeb20cec0d2fcc3937e | link | false | /test e2e-metal-ipi-serial |
| ci/prow/e2e-aws-single-node-upgrade | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-aws-single-node-upgrade |
| ci/prow/e2e-metal-ipi-ovn-ipv6 | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-metal-ipi-ovn-ipv6 |
| ci/prow/e2e-gcp-ovn-rt-upgrade | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-gcp-ovn-rt-upgrade |
| ci/prow/e2e-aws-serial | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | true | /test e2e-aws-serial |
| ci/prow/e2e-aws-single-node | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-aws-single-node |
| ci/prow/e2e-aws-disruptive | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | false | /test e2e-aws-disruptive |
| ci/prow/e2e-gcp-ovn | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | true | /test e2e-gcp-ovn |
| ci/prow/unit | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | true | /test unit |
| ci/prow/e2e-gcp-ovn-image-ecosystem | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | true | /test e2e-gcp-ovn-image-ecosystem |
| ci/prow/e2e-gcp-ovn-builds | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | true | /test e2e-gcp-ovn-builds |
| ci/prow/e2e-aws-ovn-image-registry | d2235fdb6f3102c467ab88832688bd72ab6dfd98 | link | true | /test e2e-aws-ovn-image-registry |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closed this PR.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen. Mark the issue as fresh by commenting/remove-lifecycle rotten. Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.