baremetal-runtimecfg NO-ISSUE: Fix TNA and TNF dummy ip for ipv6

Sep 08 '25 07:09 giladravid16

@giladravid16: This pull request explicitly references no jira issue.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sep 08 '25 07:09 openshift-ci-robot

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

Sep 08 '25 07:09 openshift-ci[bot]

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Dec 07 '25 09:12 openshift-bot

@giladravid16: This pull request references MGMT-22546 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Dec 24 '25 13:12 openshift-ci-robot

@giladravid16: This pull request references MGMT-22546 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

In response to this:

A bit of backround: When installing TNA/TNF clusters using assisted service, one of the master nodes acts as the bootstrap. So during the installation there will only be one master node, but we need two in order to configure keepalived. We cannot wait until the bootstrap finishes and becomes a master, because then no node will have the API vip. To circumvent that we temporarily add a dummy ip to the list of nodes. After the bootstrap becomes a master node, it's ip replaces the dummy ip in the list.

What does this PR do: Right now the dummy ip is always 0.0.0.0, but that doesn't work for clusters that are using ipv6. This PR fixes that so that if the vip is an ipv4 address then the dummy ip will be 0.0.0.0, but if not the dummy ip will be ::

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Dec 24 '25 13:12 openshift-ci-robot

After this is merged, can it also be backported all the way to 4.20?

Dec 24 '25 16:12 giladravid16

@giladravid16 yes it can but you are responsible for a Jira hygiene. You need a bug opened with Target Version field 4.22.0; only then we can go with backport

Jan 07 '26 09:01 mkowalski

/test ?

Jan 07 '26 09:01 mkowalski

@mkowalski: The following commands are available to trigger required jobs:

/test e2e-metal-ipi-ovn-ipv6

/test gofmt

/test govet

/test images

/test okd-scos-images

/test security

/test unit

/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-metal-ipi-ovn-dualstack

/test e2e-metal-ipi-ovn-ipv4

/test e2e-openstack

/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-baremetal-runtimecfg-main-e2e-metal-ipi-ovn-ipv6

pull-ci-openshift-baremetal-runtimecfg-main-gofmt

pull-ci-openshift-baremetal-runtimecfg-main-govet

pull-ci-openshift-baremetal-runtimecfg-main-images

pull-ci-openshift-baremetal-runtimecfg-main-okd-scos-images

pull-ci-openshift-baremetal-runtimecfg-main-security

pull-ci-openshift-baremetal-runtimecfg-main-unit

pull-ci-openshift-baremetal-runtimecfg-main-verify-deps

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jan 07 '26 09:01 openshift-ci[bot]

/payload ?

Jan 07 '26 09:01 mkowalski

@mkowalski: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

Jan 07 '26 09:01 openshift-ci[bot]

/payload periodic-ci-openshift-release-master-nightly-4.22-e2e-agent-ovn-two-node-arbiter-dualstack

Jan 07 '26 09:01 mkowalski

@mkowalski: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

Jan 07 '26 09:01 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-agent-ovn-two-node-arbiter-dualstack

Jan 07 '26 09:01 mkowalski

@mkowalski: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.22-e2e-agent-ovn-two-node-arbiter-dualstack

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e8b9efc0-ebab-11f0-816d-1074d99e701d-0

Jan 07 '26 09:01 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ipi-ovn-dualstack-techpreview

Jan 07 '26 09:01 mkowalski

@mkowalski: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ipi-ovn-dualstack-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/08c0f110-ebac-11f0-830b-52dbe5cfa958-0

Jan 07 '26 09:01 openshift-ci[bot]

/approve /lgtm

/hold Waiting for payload jobs to succeed

Jan 07 '26 09:01 mkowalski

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-agent-ovn-two-node-arbiter-ipv6

Jan 07 '26 09:01 mkowalski

@mkowalski: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.22-e2e-agent-ovn-two-node-arbiter-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5a664740-ebac-11f0-8322-31ccbdeeac4f-0

Jan 07 '26 09:01 openshift-ci[bot]

/lgtm cancel

@giladravid16, even with your patch the e2e-agent-ovn-two-node-arbiter-ipv6 failed. Please look at https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-baremetal-runtimecfg-369-nightly-4.22-e2e-agent-ovn-two-node-arbiter-ipv6/2008835257893130240 and figure out what went wrong.

Did you manually test this patch and it worked? Or is it just an attempt to fix?

Jan 07 '26 12:01 mkowalski

@mkowalski I tested it with Assisted's CI in https://github.com/openshift/release/pull/72884. I used a custom release image of OCP 4.20 and this PR.

Jan 07 '26 13:01 giladravid16

I tested it with Assisted's CI in openshift/release#72884

Do you mean the ci/rehearse/openshift/assisted-service/master/edge-e2e-metal-assisted-kube-api-tna-4-19 test or some other? If some other, can I please get a link to the passing Prow job? I am trying to see something that was IPv6 and succeeded.

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-agent-ovn-two-node-arbiter-ipv6

Jan 07 '26 14:01 mkowalski

@mkowalski: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.22-e2e-agent-ovn-two-node-arbiter-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/56d17600-ebd3-11f0-8308-6f803e4d321f-0

Jan 07 '26 14:01 openshift-ci[bot]

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-agent-ovn-two-node-arbiter-ipv6

Jan 07 '26 14:01 mkowalski

@mkowalski: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-agent-ovn-two-node-arbiter-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/76af1d60-ebd3-11f0-9c0c-89ae09dab6f5-0

Jan 07 '26 14:01 openshift-ci[bot]

@mkowalski yes, that's the job. The e2e-agent-ovn-two-node-arbiter-ipv6 job failed before the installation started - it failed during preparing-for-installation. The reason is that the arbiter node was unable to pull an image, even though the masters were able to. I'm pretty sure the issue is that the arbiter node doesn't have enough ram, during this phase the each host's filesystem should be half of its ram. The arbiter has 8GB of ram, so it's filesystem is 4GB, and the image it fails to pull is 2GB.

Jan 07 '26 14:01 giladravid16

The jobs where I am sure we do IPv6 are

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-baremetal-runtimecfg-369-nightly-4.21-e2e-agent-ovn-two-node-arbiter-ipv6/2008905598048931840
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-baremetal-runtimecfg-369-nightly-4.22-e2e-agent-ovn-two-node-arbiter-ipv6/2008905376468045824

but they do not seem to pass with your PR.

I don't see where ci/rehearse/openshift/assisted-service/master/edge-e2e-metal-assisted-kube-api-tna-4-19 would be IPv6 job, you need to help me understand this. I have looked at logs from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/72884/rehearse-72884-pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted-kube-api-tna-4-19/2003771232951996416/artifacts/e2e-metal-assisted-kube-api-tna-4-19/assisted-common-gather/artifacts but nodes there have IPv4 addresses.

Jan 07 '26 15:01 mkowalski

The job ci/rehearse/openshift/assisted-service/master/edge-e2e-metal-assisted-kube-api-tna-4-19 installs 2 clusters - one uses ipv4 and the other ipv6. The files that belong to the ipv6 cluster have assisted-spoke-cluster-f62795d5 in their names. For example here's a node in the cluster, and the agent cluster install (where you can see the vips).

And as I said in my previous comment, the jobs you ran fail before the installation starts. You can see it in the job's logs - the arbiter can't pull quay-proxy.ci.openshift.org/openshift/ci@sha256:aea3543b56f95f21fd574aff73c2ae7baffca24a77a7f75c26617be2e424a678 and I think it's because it doesn't have enough space for it. You can compare it to the periodic job's logs where the installation does start, but gets stuck on waiting-for-bootkube.

Jan 08 '26 07:01 giladravid16

/approved /lgtm /verified by @giladravid16

Jan 08 '26 15:01 mkowalski