cluster-api-provider-proxmox RFC: v1alpha2 API

trafficstars

v1alpha1 has problems in the networking department which will require breaking changes.

Specifically, ProxmoxMachines have a split of default network device and additional network devices for no reason that I can discern. This duplicates code pathes everywhere and actually removes possibilities (due to how network interfaces are structured in the API, default doesn't have options that other interfaces have). On top of that, the split of IPv4 and IPv6 pools is arbitrary (makes no sense), and we can simply get away with having an array of IPPools. This also allows multiple IPPools per interface, a feature which we can not support yet. On top of that, we possibly want to support network interfaces without IPPools attached (DHCP, BGP exclusively).

For now I'd change the API of ProxmoxMachines in the following way:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxMachineTemplate
metadata: 
  name: test-cluster-control-plane
  namespace: default
spec:   
  template:
    spec:
      disks:
        bootVolume:
          disk: scsi0
          sizeGb: 100
      format: qcow2
      full: true
      memoryMiB: 4096
      network:
        networkDevices:
         -  bridge: vmbr128
            model: virtio
            mtu: 9000
        - bridge: vmbr129
          dnsServers:
          - 10.4.1.1
          ipPoolRef:
           -  apiGroup: ipam.cluster.x-k8s.io
              kind: GlobalInClusterIPPool
              name: shared-int-service-v4-inclusterippool
           -  apiGroup: ipam.cluster.x-k8s.io
              kind: GlobalInClusterIPPool
              name: shared-int-service-v6-inclusterippool
          model: virtio
          name: net1
          linkMtu: 9000
      numCores: 4
      numSockets: 2
      sourceNode: proxmox1
      templateID: 100

I would resolve the conflicts about the first network device by appending the InclusterIPPool in front of the ipPoolRef array. There's no need for the first device to be called net0, but we can make the webhook add names automatically as per their position in the array.

Regarding ProxmoxCluster, I'd leave the API as is, because a Cluster can only have an IPv4 and/or an IPv6 pool. Potentially, we want to add DHCP there, but that's not a breaking change as far as I can see. We can discuss turning IPv4Pool and IPv6Pool into an actual pool reference that we supply with the cluster, which I think is the better choice, but this would be just breakage for breakages sake (we only get better validation on the to be created pool).

Anything else you would like to add: If you have any more things you'd like to change about the API, lets coordinate here. Feedback very welcome.

Oct 02 '24 10:10 65278

Proposal

network:
  networkDevices:
    - bridge: vmbr128
      model: virtio
      mtu: 9000
    - bridge: vmbr129
      model: virtio
      name: net1
      linkMtu: 9000
      dnsServers:
        - 10.4.1.1
      ipPoolRef:
        - apiGroup: ipam.cluster.x-k8s.io
          kind: GlobalInClusterIPPool
          name: shared-int-service-v4-inclusterippool
        - apiGroup: ipam.cluster.x-k8s.io
          kind: GlobalInClusterIPPool
          name: shared-int-service-v6-inclusterippool

Here are my remarks:

I liked the way you restructured the schema for networkSpec; it's more convenient. I have some remarks regarding the IPPoolRefs: You wanted to change it into a list, that's okay:

But How do we know which IpPool is for ipv4 and which for ipv6?
Changing IPPoolRef to list is a good choice, but that also means that you need so much effort in refactoring the current code, especially for rendering CloudInit, Bootstrap Data, and getting the machine addresses.
Given the remark above, you will also need more effort to refactor tests and create new ones.
Finally, I would assume the conversation webhook places the old ipv4 and ipv6 into the list.

Conclusion I'm okay with this APIChange, but I'm a bit concerned about the efforts you would put for the change of IPPoolRef into a list, if you have some clarification about that that would be great.

Oct 08 '24 10:10 mcbenjemaa

We don't need to know which IPPool is for IPv4 and which is for IPv6. That data is essentially useless (where is it used?).
The refactor makes the code much easier to maintain. At the moment it's a spaghetti nightmare, which makes implementing features like DHCP hard.
Very true
That's the plan.

Oct 08 '24 10:10 65278

We don't need to know which IPPool is for IPv4 and which is for IPv6. That data is essentially useless (where is it used?).

The refactor makes the code much easier to maintain. At the moment it's a spaghetti nightmare, which makes implementing features like DHCP hard.

Very true

That's the plan.

Alright, Good to know) I have talked to Maurice regarding the efforts we are gonna make on this Change. And he approved it. and that makes me happy.

I think we need to set up a meeting and figure out how I can assist, how we can split some work, and how to do the whole refactoring. and we need to get people involved, specifically, Jan is interested,

Oct 08 '24 10:10 mcbenjemaa

I would also like to participate at a meeting, but like here, as a more silent listener.

Oct 08 '24 11:10 3deep5me

I've thought about it, here's the plan:

We make a new branch, v1alpha2. All changes get merged in there eventually (in case we want to do more changes to the API)
I do the networking API changes, these are vast and don't subdivide well (they break stuff all over the place)
Jan can write the migrations form v1alpha1 to v1alpha2 when I'm done with getting the controller working with the new API
Mohamed does the code review

Oct 08 '24 13:10 65278

We could just merge to main, 0.6.0 won't be released until we're happy with v1alpha2 anyway.

Oct 08 '24 13:10 wikkyk

I've thought about it, here's the plan:

We make a new branch, v1alpha2. All changes get merged in there eventually (in case we want to do more changes to the API)

I do the networking API changes, these are vast and don't subdivide well (they break stuff all over the place)

Jan can write the migrations form v1alpha1 to v1alpha2 when I'm done with getting the controller working with the new API

Mohamed does the code review

It's good, but still, we need to be aware of some dependant things I think, We need some tasks for:

v1alpha2 CRD
Conversation Webhook
Validation Webhook
Cloud-init Changes to accept this new Spec.
Change Controller Logic for reconcile networkSpec.
Adjust reconcileGetMachineAddresses.

If you agree, we can create a GitHub issue for each.

Oct 09 '24 09:10 mcbenjemaa

ref #304

Oct 23 '24 07:10 wikkyk

per #304

let's be sure to add in automatic setting of the vip in addition to the vm ips

Oct 24 '24 02:10 lknite

maybe now would be a good time to add in a 'kubernetesrelease' crd also, https://github.com/ionos-cloud/cluster-api-provider-proxmox/issues/312

Oct 24 '24 03:10 lknite

ref #105 #53/#29

Nov 05 '24 11:11 wikkyk

ref #16

Nov 05 '24 12:11 wikkyk

Just wanted to send some support. You can do it! It's going to be awesome. I can help out with testing when you get to that point if you need some volunteers ...

Nov 27 '24 00:11 lknite

Wanted to again send along some support. I'd send pizza if I knew where to send it ... In anycase I came across a project virtual kubelet, and I was thinking to implement a proxmox provider. Virtual kubelet lets you specify a provider such as AWS, and when something is schedule to that virtual node, the container spins up on the provider ... such as AWS. I notice proxmox can run containers on its own, I bet it wouldn't be too hard to implement a virtual kubelet provider which can target proxmox to run containers. Once this new strategy is implemented which you are working on, and vips can be pulled from an ipam range, and node ips can be pulled from an ipam range, and we can do cluster autoscaling from those ranges... making for clusterapi gold, I can see about putting together a virtual kubelet provider ... proxmox could become the new standard for a kubernetes platform, easily replacing vmware and all its latest drama. Appreciate all the effort! Looking forward to it..

Feb 13 '25 19:02 lknite

.spec.Target can be removed in new version, as it's not used anywhere, it's just messing around. If you want to deploy machines on specific Target you can use .spec.AllowedNodes with 1 node.

One another suggestion from my side: move all template choose under: .spec.Templateselector

....
spec:
  template:
    spec:
      templateSelector:
        templateNode: node1
        templateId: 125
....

or

....
spec:
  template:
    spec:
      templateSelector:
        templateTags: 
          - tag1
          - tag2
        localStorage: true
....

this would separate template from all other spec.

May 16 '25 05:05 Atoms

cluster-api-provider-proxmox cluster-api-provider-proxmox copied to clipboard

RFC: v1alpha2 API

cluster-api-provider-proxmox
cluster-api-provider-proxmox copied to clipboard