origin icon indicating copy to clipboard operation
origin copied to clipboard

NO-JIRA: Env override NETWORKING_E2E_BOND_MTU

Open mgencur opened this issue 8 months ago • 8 comments

This commit introduces NETWORKING_E2E_BOND_MTU variable. The test for creating "bond" interface can read it to override the default value. The default value is used when .status.clusterNetworkMTU is undefined on the Network "cluster". It is automatically set by kernel in that case. The .status.clusterNetworkMTU might not be defined when using a custom CNI plugin such as Cilium.

We have run into test failures when testing Hypershift/HostedControlPlane. When the management cluster has a specific clusterNetworkMTU and the "hosted" cluster uses Cilium CNI then the hosted cluster might use a bigger value for MTU than the management cluster. In this case, the following test error happens:

ERRORED: error configuring pod [e2e-test-bond-tnxmg/pod1] networking: [e2e-test-bond-tnxmg/pod1/24b00190-fbfa-4ac5-94b6-69fb4b697a04:bondnad1]: error adding container to network "bondnad1": Invalid MTU (1500). The requested MTU for bond is bigger than that of the slave link (net1), slave MTU (1400)

Can be seen in this run

This PR allows overriding the default value 1500 from the error above with a value matching the slave MTU.

mgencur avatar Mar 31 '25 12:03 mgencur

@mgencur: This pull request explicitly references no jira issue.

In response to this:

This commit introduces NETWORKING_E2E_BOND_MTU variable. The test for creating "bond" interface can read it to override the default value. The default value is used when .status.clusterNetworkMTU is undefined on the Network "cluster". It is automatically set by kernel in that case. The .status.clusterNetworkMTU might not be defined when using a custom CNI plugin such as Cilium.

We have run into test failures when testing Hypershift/HostedControlPlane. When the management cluster has a specific clusterNetworkMTU and the "hosted" cluster uses Cilium CNI then the hosted cluster might use a bigger value for MTU than the management cluster. In this case, the following test error happens:

ERRORED: error configuring pod [e2e-test-bond-tnxmg/pod1] networking: [e2e-test-bond-tnxmg/pod1/24b00190-fbfa-4ac5-94b6-69fb4b697a04:bondnad1]: error adding container to network "bondnad1": Invalid MTU (1500). The requested MTU for bond is bigger than that of the slave link (net1), slave MTU (1400)

Can be seen in this run

This PR allows overriding the default value 1500 from the error above with a value matching the slave MTU.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Mar 31 '25 12:03 openshift-ci-robot

/cherrypick release-1.20 /cherrypick release-1.19

mgencur avatar Mar 31 '25 12:03 mgencur

@mgencur: once the present PR merges, I will cherry-pick it on top of release-1.19, release-1.20 in new PRs and assign them to you.

In response to this:

/cherrypick release-1.20 /cherrypick release-1.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mgencur Once this PR has been reviewed and has the lgtm label, please assign adambkaplan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Mar 31 '25 12:03 openshift-ci[bot]

/retest

mgencur avatar Apr 01 '25 06:04 mgencur

/retest

mgencur avatar Apr 17 '25 06:04 mgencur

Job Failure Risk Analysis for sha: 8ee656a9be3fc5fcd279cad3787a24e99e096a76

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (100) are below the historical average (1428): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-local-gateway IncompleteTests
Tests for this run (17) are below the historical average (1399): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (17) are below the historical average (874): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt[bot] avatar May 15 '25 05:05 openshift-trt[bot]

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Aug 14 '25 01:08 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Sep 13 '25 08:09 openshift-bot

@mgencur: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-gcp-ovn-etcd-scaling 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-azure-ovn-etcd-scaling 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-gcp-fips-serial 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-gcp-fips-serial
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
ci/prow/okd-e2e-gcp 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test okd-e2e-gcp
ci/prow/e2e-aws-ovn-etcd-scaling 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-aws-disruptive 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-aws-disruptive
ci/prow/e2e-vsphere-ovn-etcd-scaling 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-openstack-serial 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-openstack-serial
ci/prow/e2e-metal-ipi-virtualmedia 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-metal-ipi-ovn-kube-apiserver-rollout
ci/prow/e2e-gcp-disruptive 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link false /test e2e-gcp-disruptive
ci/prow/e2e-aws-ovn-serial 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-serial-2of2 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link true /test e2e-aws-ovn-serial-2of2
ci/prow/e2e-aws-ovn-serial-publicnet 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link true /test e2e-aws-ovn-serial-publicnet
ci/prow/e2e-gcp-csi 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link true /test e2e-gcp-csi
ci/prow/e2e-aws-csi 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link true /test e2e-aws-csi
ci/prow/go-verify-deps 8ee656a9be3fc5fcd279cad3787a24e99e096a76 link true /test go-verify-deps

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Oct 17 '25 12:10 openshift-ci[bot]

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot avatar Nov 17 '25 00:11 openshift-bot

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Nov 17 '25 00:11 openshift-ci[bot]