kops icon indicating copy to clipboard operation
kops copied to clipboard

(Experimental) bare-metal with IPv6

Open justinsb opened this issue 1 year ago • 8 comments

IPv6 brings some new complexities, particularly around IPAM.

We create a test and then fix a few things:

  • We need to assign the podCIDR for IPv6, so we add support to kops-controller. The source of this information is the host CRD.
  • Because we are assigning the podCIDR from the Host CRD, we need Host records for the control plane nodes. However, there are bootstrapping problems around creating a CRD during enrollment of the control-plane nodes. So instead, we can now generate a Host Object in yaml, and can apply it separately. A high-security workflow would probably create the host records separately anyway, because they are how we validate nodes.
  • Previously we were always setting the kubelet cloud-provider=external flag. But this assumes we are running a CCM. If we are not running a CCM (like metal), then we should not set the flag. If we do set the flag, kubelet sets the node.kops.k8s.io/uninitialized taint for CCM to clear, and nobody clears it.
  • We need to make sure there is an IPv6 default route so that kubelet can discover its node ip correctly. We could put this into the Host CRD, but it does seem like most nodes will have a default route.

justinsb avatar Nov 11 '24 16:11 justinsb

I am trying to upload this and then I can rebasing as I/we fix each problem.

Current problem is from nodeup:

vm0 nodeup[703]: W1111 17:07:00.322041     703 main.go:133] got error running nodeup (will retry in 30s): error building loader: building *model.PrefixBuilder: kOps IPAM controller not supported on cloud "metal"

So we need to decide how the podCIDR is assigned!

justinsb avatar Nov 11 '24 16:11 justinsb

/retest

justinsb avatar Nov 17 '24 15:11 justinsb

/retest

justinsb avatar Feb 20 '25 16:02 justinsb

/retest

justinsb avatar Feb 21 '25 14:02 justinsb

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 22 '25 15:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 24 '25 21:06 k8s-triage-robot

/remove-lifecycle rotten

hakman avatar Jul 04 '25 12:07 hakman

cc @hakman

I think this is now uncontroversial (I hope). We assign podCIDRs to nodes if they are configured on the Host object. If users don't want to do that, they just don't set podCIDRs on the Host object.

justinsb avatar Jul 26 '25 16:07 justinsb

cc @hakman

I think this is now uncontroversial (I hope). We assign podCIDRs to nodes if they are configured on the Host object. If users don't want to do that, they just don't set podCIDRs on the Host object.

Cool, I will take a look soon. 🚀

hakman avatar Jul 26 '25 17:07 hakman

/retest

justinsb avatar Jul 27 '25 01:07 justinsb

/test all

hakman avatar Jul 27 '25 04:07 hakman

/hold in case you want to update the other APIs (could also be a separate PR). /lgtm /approve

hakman avatar Jul 27 '25 04:07 hakman

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 27 '25 04:07 k8s-ci-robot

/test pull-kops-e2e-k8s-aws-amazonvpc

hakman avatar Jul 27 '25 04:07 hakman

/test pull-kops-e2e-k8s-gce-cilium

hakman avatar Jul 27 '25 04:07 hakman

/test pull-kops-e2e-k8s-aws-calico

hakman avatar Jul 27 '25 05:07 hakman

/test pull-kops-e2e-k8s-aws-amazonvpc

hakman avatar Jul 27 '25 06:07 hakman

/hold cancel

I propose adding a round-trip test alongside fixing the missing field in v1alpha3

justinsb avatar Jul 28 '25 16:07 justinsb