OCPBUGS-26601: Re-enable test/extended/router/http2 tests on AWS
It's been a long time since we disabled these tests on AWS. I have been running the http2 tests on AWS all week and I haven't run into the issue once. Let's re-enable the http2 x AWS tests for better coverage.
This PR also addresses an intermittent issue encountered in AWS environments during the router's h2spec conformance tests. The challenge involved slower hostname resolution within the cluster, resulting in frequent timeouts. Notably, AWS exhibited slower resolution times compared to Azure or GCP, hinting at potential differences in DNS handling.
The solution implemented in this PR focuses on resolving the hostname on the test host before initiating the h2spec tests within the cluster. This adjustment has resulted in a remarkable improvement in test execution speed, with the h2spec test now completing in approximately 85 seconds, a significant reduction from the previous average of over 376 seconds (just above the 5-minute mark).
While the difference in resolution times suggests environmental variations, particularly in AWS, it's important to note that this PR does not definitively attribute the issue to negative caching. Instead, it prioritises the substantial improvement achieved through the new approach. As a precaution, the polling interval and overall test timeout have been adjusted to 2 seconds and 10 minutes, respectively, to enhance test success rates across diverse cloud environments.
This PR represents a practical win in terms of improved test efficiency, while acknowledging potential environmental differences for further investigation, if needed, in the future.
Original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1912413
@frobware: This pull request references Jira Issue OCPBUGS-26601, which is invalid:
- expected the bug to target the "4.16.0" version, but no target version was set
Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.
The bug has been updated to refer to the pull request using the external bug tracker.
In response to this:
It's been a long time since we disabled these tests on AWS. I have been running the http2 tests on AWS all week and I haven't run into the issue once. Let's re-enable the http2 x AWS tests for better coverage.
Original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1912413
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
/jira refresh
@frobware: This pull request references Jira Issue OCPBUGS-26601, which is valid. The bug has been moved to the POST state.
3 validation(s) were run on this bug
- bug is open, matching expected state (open)
- bug target version (4.16.0) matches configured target version for branch (4.16.0)
- bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Requesting review from QA contact: /cc @lihongan
In response to this:
/jira refresh
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
See https://github.com/openshift/origin/pull/26089
/approve /lgtm
/retest-required
Remaining retests: 0 against base HEAD e913a6484ff96d09171d9fd609096331b4b1cbfe and 2 for PR HEAD 09eb1f04ebab889fa1428b8218195528e7e276db in total
/retest-required
Remaining retests: 0 against base HEAD 52f2f6b6e66a07134657845c0d4ee4d557e80af7 and 1 for PR HEAD 09eb1f04ebab889fa1428b8218195528e7e276db in total
/retest-required
Remaining retests: 0 against base HEAD 663c840b1147e4eaf1e0576fc4e1c5d391e5f3ab and 0 for PR HEAD 09eb1f04ebab889fa1428b8218195528e7e276db in total
/hold
Revision 09eb1f04ebab889fa1428b8218195528e7e276db was retested 3 times: holding
/retest
@frobware: This pull request references Jira Issue OCPBUGS-26601, which is valid.
3 validation(s) were run on this bug
- bug is open, matching expected state (open)
- bug target version (4.16.0) matches configured target version for branch (4.16.0)
- bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Requesting review from QA contact: /cc @lihongan
In response to this:
It's been a long time since we disabled these tests on AWS. I have been running the http2 tests on AWS all week and I haven't run into the issue once. Let's re-enable the http2 x AWS tests for better coverage.
This PR also addresses an intermittent issue encountered in AWS environments during the router's h2spec conformance tests. The challenge involved slower hostname resolution within the cluster, resulting in frequent timeouts. Notably, AWS exhibited slower resolution times compared to Azure or GCP, hinting at potential differences in DNS handling.
The solution implemented in this PR focuses on resolving the hostname on the test host before initiating the h2spec tests within the cluster. This adjustment has resulted in a remarkable improvement in test execution speed, with the h2spec test now completing in approximately 85 seconds, a significant reduction from the previous average of over 376 seconds (just above the 5-minute mark).
While the difference in resolution times suggests environmental variations, particularly in AWS, it's important to note that this PR does not definitively attribute the issue to negative caching. Instead, it prioritises the substantial improvement achieved through the new approach. As a precaution, the polling interval and overall test timeout have been adjusted to 2 seconds and 10 minutes, respectively, to enhance test success rates across diverse cloud environments.
This PR represents a practical win in terms of improved test efficiency, while acknowledging potential environmental differences for further investigation, if needed, in the future.
Original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1912413
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
/test e2e-aws-ovn-upi
@lihongan: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:
/test e2e-aws-jenkins/test e2e-aws-ovn-fips/test e2e-aws-ovn-image-registry/test e2e-aws-ovn-serial/test e2e-gcp-ovn/test e2e-gcp-ovn-builds/test e2e-gcp-ovn-image-ecosystem/test e2e-gcp-ovn-upgrade/test e2e-metal-ipi-ovn-ipv6/test images/test lint/test unit/test verify/test verify-deps
The following commands are available to trigger optional jobs:
/test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback/test e2e-agnostic-ovn-cmd/test e2e-aws/test e2e-aws-csi/test e2e-aws-disruptive/test e2e-aws-etcd-recovery/test e2e-aws-multitenant/test e2e-aws-ovn/test e2e-aws-ovn-cgroupsv2/test e2e-aws-ovn-etcd-scaling/test e2e-aws-ovn-kubevirt/test e2e-aws-ovn-single-node/test e2e-aws-ovn-single-node-serial/test e2e-aws-ovn-single-node-upgrade/test e2e-aws-ovn-upgrade/test e2e-aws-proxy/test e2e-azure/test e2e-azure-ovn-etcd-scaling/test e2e-baremetalds-kubevirt/test e2e-gcp-csi/test e2e-gcp-disruptive/test e2e-gcp-fips-serial/test e2e-gcp-ovn-etcd-scaling/test e2e-gcp-ovn-rt-upgrade/test e2e-gcp-ovn-techpreview/test e2e-gcp-ovn-techpreview-serial/test e2e-metal-ipi-ovn-dualstack/test e2e-metal-ipi-sdn/test e2e-metal-ipi-serial/test e2e-metal-ipi-serial-ovn-ipv6/test e2e-metal-ipi-virtualmedia/test e2e-openstack-ovn/test e2e-openstack-serial/test e2e-vsphere/test e2e-vsphere-ovn-etcd-scaling/test okd-e2e-gcp
Use /test all to run the following jobs that were automatically triggered:
pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmdpull-ci-openshift-origin-master-e2e-aws-csipull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2pull-ci-openshift-origin-master-e2e-aws-ovn-fipspull-ci-openshift-origin-master-e2e-aws-ovn-serialpull-ci-openshift-origin-master-e2e-aws-ovn-single-nodepull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serialpull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgradepull-ci-openshift-origin-master-e2e-aws-ovn-upgradepull-ci-openshift-origin-master-e2e-gcp-csipull-ci-openshift-origin-master-e2e-gcp-ovnpull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgradepull-ci-openshift-origin-master-e2e-gcp-ovn-upgradepull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6pull-ci-openshift-origin-master-e2e-metal-ipi-sdnpull-ci-openshift-origin-master-e2e-openstack-ovnpull-ci-openshift-origin-master-imagespull-ci-openshift-origin-master-lintpull-ci-openshift-origin-master-unitpull-ci-openshift-origin-master-verifypull-ci-openshift-origin-master-verify-deps
In response to this:
/test e2e-aws-ovn-upi
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This probably requires https://github.com/openshift/cloud-provider-aws/pull/57
/retest-required
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: candita, frobware
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~test/extended/router/OWNERS~~ [frobware]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/unhold
/retest-required
Remaining retests: 0 against base HEAD 7812f3cbbadfff5e7570f48b2dce3331ab24b729 and 2 for PR HEAD 00ea63b861570609c0bd7b02254c303772ea5b33 in total
/hold
I think the consensus was that this PR still requires https://github.com/openshift/cloud-provider-aws/pull/57.
Slack discussion: https://redhat-internal.slack.com/archives/CBWMXQJKD/p1704908895477469.
/hold
I think the consensus was that this PR still requires openshift/cloud-provider-aws#57.
57^ has merged.
/test all
/retest
/jira refresh
The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.
@openshift-bot: This pull request references Jira Issue OCPBUGS-26601, which is valid.
3 validation(s) were run on this bug
- bug is open, matching expected state (open)
- bug target version (4.16.0) matches configured target version for branch (4.16.0)
- bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Requesting review from QA contact: /cc @lihongan
In response to this:
/jira refresh
The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
/jira refresh
The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.
@openshift-bot: This pull request references Jira Issue OCPBUGS-26601, which is invalid:
- expected the bug to target either version "4.17." or "openshift-4.17.", but it targets "4.16.0" instead
Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.
In response to this:
/jira refresh
The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
/test all
/test all
failed to initialize the cluster: Cluster operators authentication, console, control-plane-machine-set, image-registry, ingress, machine-api, monitoring are not available
/test e2e-gcp-ovn-upgrade
/hold cancel
From https://redhat-internal.slack.com/archives/CBWMXQJKD/p1715681553819319?thread_ts=1704908895.477469&cid=CBWMXQJKD
we still don't have an accepted nightly build that including the pr, but I ran flexy job and installed one aws upi cluster with 4.16.0-0.nightly-2024-05-13-102953 (rejected) today. I believe the issue is fixed, ingresscontroller as well as LB service can be deleted within about 1'20'' and no k8s rules leaking in Security Groups.
cc @lihongan