origin icon indicating copy to clipboard operation
origin copied to clipboard

NO-JIRA: Pass the port to internal LB deployment in monitoring tests

Open mgencur opened this issue 7 months ago • 7 comments

The .status.apiServerInternalURI of Infrastructure might contain a port (for example https://api.d560406ce00e8ae40e77.hypershift.local:443) In this case the port must be passed to the deployment as the default port 6443 will not work.

We ran into this issue when testing Hypershift with --endpoint-access=private and external DNS. In this case the port is 443. The test [Jira: "kube-apiserver"] can collect apiserver.openshift.io/disruption-actor=poller poller pod logs failed in this run, the error was:

Logs for -n e2e-disruption-monitor-j4clg pod/internal-lb-monitor-54cc8ddddd-cndzp
I0423 12:19:55.995391       1 factory.go:193] Registered Plugin "containerd"
openshift-tests v4.1.0-9244-g118fc94
  I0423 12:19:56.032703       1 merged_client_builder.go:163] Using in-cluster namespace
  I0423 12:19:56.032866       1 merged_client_builder.go:121] Using in-cluster configuration
error: Get "https://api.d560406ce00e8ae40e77.hypershift.local:6443/api/v1/namespaces/default": dial tcp 10.0.128.146:6443: i/o timeout

mgencur avatar Apr 24 '25 12:04 mgencur

@mgencur: This pull request explicitly references no jira issue.

In response to this:

The .status.apiServerInternalURI of Infrastructure might contain a port (for example https://api.d560406ce00e8ae40e77.hypershift.local:443) In this case the port must be passed to the deployment as the default port 6443 will not work.

We ran into this issue when testing Hypershift with --endpoint-access=private and external DNS. In this case the port is 443. The test [Jira: "kube-apiserver"] can collect apiserver.openshift.io/disruption-actor=poller poller pod logs failed in this run, the error was:

Logs for -n e2e-disruption-monitor-j4clg pod/internal-lb-monitor-54cc8ddddd-cndzp
I0423 12:19:55.995391       1 factory.go:193] Registered Plugin "containerd"
openshift-tests v4.1.0-9244-g118fc94
 I0423 12:19:56.032703       1 merged_client_builder.go:163] Using in-cluster namespace
 I0423 12:19:56.032866       1 merged_client_builder.go:121] Using in-cluster configuration
error: Get "https://api.d560406ce00e8ae40e77.hypershift.local:6443/api/v1/namespaces/default": dial tcp 10.0.128.146:6443: i/o timeout

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Apr 24 '25 12:04 openshift-ci-robot

/retest

mgencur avatar Apr 25 '25 06:04 mgencur

/assign @vrutkovs

wangke19 avatar Apr 30 '25 18:04 wangke19

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mgencur, vrutkovs Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar May 05 '25 07:05 openshift-ci[bot]

/retest-required

vrutkovs avatar May 05 '25 07:05 vrutkovs

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Aug 14 '25 01:08 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Sep 13 '25 08:09 openshift-bot

Job Failure Risk Analysis for sha: 1f75cdaaa457ad9e4e3c546244a151ac1923c897

Job Name Failure Risk
pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback MissingData
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling High
[bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
This test has passed 98.35% of 2847 runs on release 4.20 [Overall] in the last week.

Open Bugs
CI: API is broken in periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-single-node-techpreview-serial
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling High
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
This test has passed 99.96% of 2799 runs on release 4.20 [Overall] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Low
[sig-api-machinery] disruption/cache-openshift-api apiserver/openshift-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/cache-kube-api apiserver/kube-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-instrumentation] disruption/metrics-api connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
e2e-aws-ovn-ipsec-upgrade job is failing with disruptive events
---
[sig-api-machinery] disruption/cache-oauth-api apiserver/oauth-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
Showing 4 of 5 test results

openshift-trt[bot] avatar Oct 03 '25 19:10 openshift-trt[bot]

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-merge-robot avatar Nov 04 '25 17:11 openshift-merge-robot

@mgencur: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-ovn-cmd 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-metal-ipi-ovn-kube-apiserver-rollout
ci/prow/e2e-metal-ipi-serial-ovn-ipv6 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-metal-ipi-serial-ovn-ipv6
ci/prow/e2e-azure-ovn-upgrade 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-openstack-serial 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-openstack-serial
ci/prow/e2e-vsphere-ovn-etcd-scaling 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-metal-ipi-serial 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-metal-ipi-serial
ci/prow/okd-e2e-gcp 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test okd-e2e-gcp
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-gcp-ovn-etcd-scaling 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
ci/prow/e2e-aws-disruptive 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-aws-disruptive
ci/prow/e2e-gcp-disruptive 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-gcp-disruptive
ci/prow/e2e-azure-ovn-etcd-scaling 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-gcp-fips-serial 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-gcp-fips-serial
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-ovn-etcd-scaling 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-serial 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-serial-publicnet 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link true /test e2e-aws-ovn-serial-publicnet
ci/prow/e2e-gcp-csi 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link true /test e2e-gcp-csi
ci/prow/e2e-aws-csi 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link true /test e2e-aws-csi
ci/prow/go-verify-deps 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link true /test go-verify-deps
ci/prow/e2e-aws-ovn-microshift-serial 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link true /test e2e-aws-ovn-microshift-serial
ci/prow/e2e-aws-ovn-microshift 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link true /test e2e-aws-ovn-microshift
ci/prow/e2e-metal-ipi-ovn-ipv6 1f75cdaaa457ad9e4e3c546244a151ac1923c897 link true /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Nov 18 '25 12:11 openshift-ci[bot]