OCPBUGS-59176: fix several failing tests in gcp-custom-dns job
Some e2e tests are failing with the job "gcp-custom-dns" for featuregate "GCPClusterHostedDNSInstall" which is promoted to GA in 4.20. In the "custom-dns" cluster OpenShift will start static CoreDNS pods to provide DNS resolution for API, Internal API and Ingress services that are essential for cluster creation. After cluster deployment is completed, the customer will update their external DNS solution with the same assigned LB IP addresses used for the configuration of the internal CoreDNS instance.
The failing tests like http2 and grpc tests use dedicated ingresscontrollers, and gateway also has separated LB and dnsrecord, so the default wildcard created by the new static CoreDNS won't work for those tests.
To fix the failing tests, we could force the request to use LoadBalancer IP address directly and bypass the DNS resolution.
Also update http2/grpc shard ingressconroller to NOT use domain like "e2e-test-xxx.apps.baseDomain" to avoid overlapping with default wildcard "*.apps.baseDomain".
@lihongan: This pull request references Jira Issue OCPBUGS-59176, which is valid.
3 validation(s) were run on this bug
- bug is open, matching expected state (open)
- bug target version (4.20.0) matches configured target version for branch (4.20.0)
- bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Requesting review from QA contact: /cc @lihongan
The bug has been updated to refer to the pull request using the external bug tracker.
In response to this:
Some e2e tests are failing with the job "gcp-custom-dns" for featuregate "GCPClusterHostedDNSInstall" which is promoted to GA in 4.20. In the "custom-dns" cluster OpenShift will start static CoreDNS pods to provide DNS resolution for API, Internal API and Ingress services that are essential for cluster creation. After cluster deployment is completed, the customer will update their external DNS solution with the same assigned LB IP addresses used for the configuration of the internal CoreDNS instance.
The failing tests like http2 and grpc tests use dedicated ingresscontrollers, and gateway also has separated LB and dnsrecord, so the default wildcard created by the new static CoreDNS won't work for those tests.
To fix the failing tests, we could force the request to use LoadBalancer IP address directly and bypass the DNS resolution.
Also update http2/grpc shard ingressconroller to NOT use domain like "e2e-test-xxx.apps.<baseDomain>" to avoid overlapping with default wildcard "*.apps.<baseDomain>".
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: lihongan.
Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.
In response to this:
@lihongan: This pull request references Jira Issue OCPBUGS-59176, which is valid.
3 validation(s) were run on this bug
- bug is open, matching expected state (open)
- bug target version (4.20.0) matches configured target version for branch (4.20.0)
- bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Requesting review from QA contact: /cc @lihongan
The bug has been updated to refer to the pull request using the external bug tracker.
In response to this:
Some e2e tests are failing with the job "gcp-custom-dns" for featuregate "GCPClusterHostedDNSInstall" which is promoted to GA in 4.20. In the "custom-dns" cluster OpenShift will start static CoreDNS pods to provide DNS resolution for API, Internal API and Ingress services that are essential for cluster creation. After cluster deployment is completed, the customer will update their external DNS solution with the same assigned LB IP addresses used for the configuration of the internal CoreDNS instance.
The failing tests like http2 and grpc tests use dedicated ingresscontrollers, and gateway also has separated LB and dnsrecord, so the default wildcard created by the new static CoreDNS won't work for those tests.
To fix the failing tests, we could force the request to use LoadBalancer IP address directly and bypass the DNS resolution.
Also update http2/grpc shard ingressconroller to NOT use domain like "e2e-test-xxx.apps.<baseDomain>" to avoid overlapping with default wildcard "*.apps.<baseDomain>".
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/hold the gRPC DialContext is still not fixed yet
/assign
As a continuation of https://github.com/openshift/origin/pull/29985.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: lihongan Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/unhold gRPC dialer is updated as well to allow it send request to LB directly if DNS doesn't not work.
Job Failure Risk Analysis for sha: ae965e978cffcf821bd60582102548e3fff18c35
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-main-e2e-aws-disruptive | Medium [sig-arch] events should not repeat pathologically for ns/openshift-kube-apiserver-operator Potential external regression detected for High Risk Test analysis --- [sig-node] static pods should start after being created Potential external regression detected for High Risk Test analysis --- [bz-Etcd] clusteroperator/etcd should not change condition/Available Potential external regression detected for High Risk Test analysis --- [sig-cli][OCPFeatureGate:UpgradeStatus] oc amd upgrade status never fails Potential external regression detected for High Risk Test analysis |
/retest-required
/retest-required
/retest-required
/test e2e-gcp-ovn-techpreview-serial-2of2
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.20-e2e-gcp-custom-dns-techpreview https://github.com/openshift/release/pull/68515
@lihongan: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.20-e2e-gcp-custom-dns-techpreview openshift/release#68515
@alebedev87: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
- periodic-ci-openshift-release-master-nightly-4.20-e2e-gcp-custom-dns-techpreview
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7bc283a0-8346-11f0-940f-c2b3445376f7-0
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.20-e2e-gcp-custom-dns-techpreview openshift/release#68515
Thank you, Andrew. Looks the job you triggered failed at install, Let me retest
@lihongan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
- periodic-ci-openshift-release-master-nightly-4.20-e2e-gcp-custom-dns-techpreview
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c0f7d490-83b2-11f0-9c8f-1999f28ec0d9-0
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
@lihongan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-ovn-kube-apiserver-rollout |
| ci/prow/e2e-metal-ipi-virtualmedia | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-virtualmedia |
| ci/prow/e2e-aws-disruptive | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-aws-disruptive |
| ci/prow/e2e-metal-ipi-serial-1of2 | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-serial-1of2 |
| ci/prow/e2e-metal-ipi-serial-2of2 | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-serial-2of2 |
| ci/prow/e2e-metal-ipi-serial-ovn-ipv6-2of2 | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-serial-ovn-ipv6-2of2 |
| ci/prow/e2e-aws-proxy | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-aws-proxy |
| ci/prow/e2e-aws-ovn-single-node-upgrade | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-aws-ovn-single-node-upgrade |
| ci/prow/e2e-aws-ovn-kube-apiserver-rollout | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-aws-ovn-kube-apiserver-rollout |
| ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-ovn-dualstack-local-gateway |
| ci/prow/e2e-openstack-ovn | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-openstack-ovn |
| ci/prow/e2e-metal-ipi-serial-ovn-ipv6-1of2 | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-serial-ovn-ipv6-1of2 |
| ci/prow/e2e-metal-ipi-ovn | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-ovn |
| ci/prow/e2e-metal-ipi-ovn-dualstack | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | false | /test e2e-metal-ipi-ovn-dualstack |
| ci/prow/e2e-gcp-csi | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | true | /test e2e-gcp-csi |
| ci/prow/e2e-aws-csi | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | true | /test e2e-aws-csi |
| ci/prow/go-verify-deps | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | true | /test go-verify-deps |
| ci/prow/e2e-aws-ovn-microshift | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | true | /test e2e-aws-ovn-microshift |
| ci/prow/e2e-aws-ovn-microshift-serial | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | true | /test e2e-aws-ovn-microshift-serial |
| ci/prow/e2e-metal-ipi-ovn-ipv6 | 88d6ced25b6b44592780cc5014c0bee8765866bb | link | true | /test e2e-metal-ipi-ovn-ipv6 |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
Job Failure Risk Analysis for sha: 88d6ced25b6b44592780cc5014c0bee8765866bb
| Job Name | Failure Risk |
|---|---|
| pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade | Medium Job run should complete before timeout This test has passed 91.46% of 5282 runs on release 4.21 [Overall] in the last week. |
| pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 | IncompleteTests Tests for this run (2) are below the historical average (2444): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems) |