kops (Experimental) bare-metal with IPv6

IPv6 brings some new complexities, particularly around IPAM.

We create a test and then fix a few things:

We need to assign the podCIDR for IPv6, so we add support to kops-controller. The source of this information is the host CRD.
Because we are assigning the podCIDR from the Host CRD, we need Host records for the control plane nodes. However, there are bootstrapping problems around creating a CRD during enrollment of the control-plane nodes. So instead, we can now generate a Host Object in yaml, and can apply it separately. A high-security workflow would probably create the host records separately anyway, because they are how we validate nodes.
Previously we were always setting the kubelet cloud-provider=external flag. But this assumes we are running a CCM. If we are not running a CCM (like metal), then we should not set the flag. If we do set the flag, kubelet sets the node.kops.k8s.io/uninitialized taint for CCM to clear, and nobody clears it.
We need to make sure there is an IPv6 default route so that kubelet can discover its node ip correctly. We could put this into the Host CRD, but it does seem like most nodes will have a default route.

Nov 11 '24 16:11 justinsb

I am trying to upload this and then I can rebasing as I/we fix each problem.

Current problem is from nodeup:

vm0 nodeup[703]: W1111 17:07:00.322041     703 main.go:133] got error running nodeup (will retry in 30s): error building loader: building *model.PrefixBuilder: kOps IPAM controller not supported on cloud "metal"

So we need to decide how the podCIDR is assigned!

Nov 11 '24 16:11 justinsb

/retest

Nov 17 '24 15:11 justinsb

/retest

Feb 20 '25 16:02 justinsb

/retest

Feb 21 '25 14:02 justinsb

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 22 '25 15:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jun 24 '25 21:06 k8s-triage-robot

/remove-lifecycle rotten

Jul 04 '25 12:07 hakman

cc @hakman

I think this is now uncontroversial (I hope). We assign podCIDRs to nodes if they are configured on the Host object. If users don't want to do that, they just don't set podCIDRs on the Host object.

Jul 26 '25 16:07 justinsb

cc @hakman

I think this is now uncontroversial (I hope). We assign podCIDRs to nodes if they are configured on the Host object. If users don't want to do that, they just don't set podCIDRs on the Host object.

Cool, I will take a look soon. 🚀

Jul 26 '25 17:07 hakman

/retest

Jul 27 '25 01:07 justinsb

/test all

Jul 27 '25 04:07 hakman

/hold in case you want to update the other APIs (could also be a separate PR). /lgtm /approve

Jul 27 '25 04:07 hakman

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hakman]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jul 27 '25 04:07 k8s-ci-robot

/test pull-kops-e2e-k8s-aws-amazonvpc

Jul 27 '25 04:07 hakman

/test pull-kops-e2e-k8s-gce-cilium

Jul 27 '25 04:07 hakman

/test pull-kops-e2e-k8s-aws-calico

Jul 27 '25 05:07 hakman

/test pull-kops-e2e-k8s-aws-amazonvpc

Jul 27 '25 06:07 hakman

/hold cancel

I propose adding a round-trip test alongside fixing the missing field in v1alpha3

Jul 28 '25 16:07 justinsb