cluster-api-provider-proxmox icon indicating copy to clipboard operation
cluster-api-provider-proxmox copied to clipboard

feat: support for externally managed control plane

Open prometherion opened this issue 1 year ago • 21 comments

Issue #95

Description of changes:

Supporting empty Control Plane endpoint when ProxmoxCluster is used by an externally managed Control Plane.

The ProxmoxCluster controller will wait for a valid IP before proceeding to mark the infrastructure ready.

Testing performed:

N.A.

prometherion avatar Feb 09 '24 11:02 prometherion

Quality Gate Failed Quality Gate failed

Failed conditions
20.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Feb 09 '24 11:02 sonarqubecloud[bot]

Before providing more test coverage, may I ask for a simple review of the proposed changes, from a workflow perspective?

It's not clear to me if the maintainers are open to skipping the validation of the ProxmoxCluster.spec.controlPlaneEnpdoint field.

prometherion avatar Feb 09 '24 14:02 prometherion

Hi @prometherion, thank you for your contribution. Skipping validation for optional fields is fine, but please make sure your changes don't alter the validation behavior of other fields. Users should see errors as early as possible. The controller checks for empty control plane endpoint should be done after IPAM was reconciled, so that they're not waiting for the external process to set the control plane endpoint.

Does Kamaji set the port and address separately? I'm asking, because my understanding was that the endpoint is always written in full or not at all. This would mean that you could merge the two conditions into one MissingControlPlaneEndpoint.

avorima avatar Feb 19 '24 17:02 avorima

Does Kamaji set the port and address separately?

Kamaji is doing in a single transaction, yes, I can uniform those checks.

prometherion avatar Feb 22 '24 14:02 prometherion

Anything needed here?

mcbenjemaa avatar Feb 28 '24 15:02 mcbenjemaa

Thanks for the heads up @mcbenjemaa, I'm planning to work on this to make the PR ready for review, by the end of the week or the following one.

Pretty busy days, sorry.

prometherion avatar Feb 28 '24 16:02 prometherion

take your time mate

mcbenjemaa avatar Feb 29 '24 12:02 mcbenjemaa

@prometherion Are there any updates on this, or can I try it

mcbenjemaa avatar Apr 30 '24 08:04 mcbenjemaa

Finally, I'm revamping it, feeling sorry for being late @mcbenjemaa.

Let me know if we're ready to get this tested.

prometherion avatar May 27 '24 15:05 prometherion

can you provide a use-case to test it with Kamaji?

mcbenjemaa avatar May 28 '24 09:05 mcbenjemaa

However, I don't think I like the 'magic' behaviour of empty host and(?) port meaning externally managed. Someone could omit host/port by accident without actually intending to use Kamaji or so. This needs very clear documentation. (Technically, this doesn't actually make the fields optional (-:)

Documentation is a great place, of course, wondering if we could implement this kind of check in a different way.

The Cluster API has a contract for an externally managed control plane thanks to the status key externalManagedControlPlane.

When a user is open to using Kamaji, or any other Control Plane provider which is satisfying that status key contract, we could skip the validation requiring a filled ControlPlaneEndpoint struct. With this, we're creating more guardrails for users that otherwise could shoot themselves in the foot.

prometherion avatar Jun 10 '24 10:06 prometherion

Two points, then:

  1. We should set externalManagedControlPlane in the status when selected
  2. I like it when things are explicit (or I don't like magic values) so I'd opt for a new key in ProxmoxClusterSpec (we can name it ExternalManagedControlPlane to stay consistent) and the user either sets it to true or populates ControlPlaneEndpoint and we validate that only one of those is set.

wikkyk avatar Jun 11 '24 05:06 wikkyk

  1. We should set externalManagedControlPlane in the status when selected

A bit confused, here. For our controller (Kamaji) it's already done, nothing's expected from the Proxmox CAPI provider.

2. I like it when things are explicit (or I don't like magic values) so I'd opt for a new key in ProxmoxClusterSpec (we can name it ExternalManagedControlPlane to stay consistent) and the user either sets it to true or populates ControlPlaneEndpoint and we validate that only one of those is set.

Working on this.

prometherion avatar Jul 01 '24 11:07 prometherion

Dunno who to ping here, I think I addressed all the required changes.

prometherion avatar Jul 15 '24 12:07 prometherion

@wikkyk no worries, no pressure at all, I truly understand the efforts in maintaining open source, especially considering summer has approached.

I addressed the changes you requested, and noticed e2e failed tho, could you help me understand what's broken here?

prometherion avatar Jul 20 '24 12:07 prometherion

@prometherion the e2e fails are on us. We'll move them to a more reliable infra Soon™ :)

wikkyk avatar Jul 24 '24 11:07 wikkyk

What is missing here, and what we can do?

mcbenjemaa avatar Sep 04 '24 09:09 mcbenjemaa

@mcbenjemaa I just fixed the broken generated files, an e2e would be cool despite it seems a bit flaky according to the latest runs.

I can try also to provide a small recorded smoke test by showing the integration with Proxmox and Kamaji: unfortunately, providing a proper test is a bit complicated since the dependencies between the moving parts.

prometherion avatar Sep 05 '24 20:09 prometherion

@mcbenjemaa I just fixed the broken generated files, an e2e would be cool despite it seems a bit flaky according to the latest runs.

I can try also to provide a small recorded smoke test by showing the integration with Proxmox and Kamaji: unfortunately, providing a proper test is a bit complicated since the dependencies between the moving parts.

Can you share with me the manifest used to provision a proxmox cluster?

mcbenjemaa avatar Sep 06 '24 13:09 mcbenjemaa

@prometherion, we are happy to add support for this, can you help us document how to provide a cluster template and the technical considerations for the Kamaji CP provider?

mcbenjemaa avatar Nov 04 '24 15:11 mcbenjemaa

Finally! Thank you for your contribution! :)

wikkyk avatar Nov 04 '24 15:11 wikkyk

@mcbenjemaa absolutely, it's on my todo list!

prometherion avatar Nov 07 '24 13:11 prometherion

Sharing also here just a reference to get this working with Kamaji as externally managed Control Plane

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: proxmox-quickstart
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - REDACTED/REDACTED
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
    kind: KamajiControlPlane
    name: proxmox-quickstart
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
    kind: ProxmoxCluster
    name: proxmox-quickstart
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
kind: KamajiControlPlane
metadata:
  name: proxmox-quickstart
  namespace: default
spec:
  dataStoreName: default
  addons:
    coreDNS: { }
    kubeProxy: { }
  kubelet:
    cgroupfs: systemd
    preferredAddressTypes:
    - InternalIP
  network:
    serviceType: LoadBalancer
  deployment:
  replicas: 2
  version: 1.29.7
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxCluster
metadata:
  name: proxmox-quickstart
  namespace: default
spec:
  allowedNodes:
  - pve
  dnsServers:
  - REDACTED
  - REDACTED
  externalManagedControlPlane: true
  ipv4Config:
    addresses:
    - REDACTED-REDACTED
    gateway: REDACTED
    prefix: REDACTED
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: proxmox-quickstart-workers
  namespace: default
spec:
  clusterName: proxmox-quickstart
  replicas: 2
  selector:
    matchLabels: null
  template:
    metadata:
      labels:
        node-role.kubernetes.io/node: ""
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: proxmox-quickstart-worker
      clusterName: proxmox-quickstart
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
        kind: ProxmoxMachineTemplate
        name: proxmox-quickstart-worker
      version: v1.29.7
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxMachineTemplate
metadata:
  name: proxmox-quickstart-worker
  namespace: default
spec:
  template:
    spec:
      disks:
        bootVolume:
          disk: scsi0
          sizeGb: REDACTED
      format: qcow2
      full: true
      memoryMiB: REDACTED
      network:
        default:
          bridge: REDACTED
          model: virtio
      numCores: REDACTED
      numSockets: REDACTED
      sourceNode: pve
      templateID: REDACTED
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: proxmox-quickstart-worker
  namespace: default
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            provider-id: proxmox://'{{ ds.meta_data.instance_id }}'
      users:
      - name: root
        sshAuthorizedKeys:
        - REDACTED

I wasn't able to let worker nodes join the Control Plane mostly because I'm working on a kind environment, and Proxmox VE deployed in a Virtual Machine via Vagrant.

But overall everything's look good from the clusterctl describe command, sic:

NAME                                                         READY  SEVERITY  REASON                       SINCE  MESSAGE                                                       
Cluster/proxmox-quickstart                                   True                                          7m30s                                                                 
├─ClusterInfrastructure - ProxmoxCluster/proxmox-quickstart  True                                          7m43s                                                                 
├─ControlPlane - KamajiControlPlane/proxmox-quickstart                                                                                                                           
└─Workers                                                                                                                                                                        
  └─MachineDeployment/proxmox-quickstart-workers             False  Warning   WaitingForAvailableMachines  7m43s  Minimum availability requires 2 replicas, current 0 available
    ├─Machine/proxmox-quickstart-workers-ft5tv-gfxf8         False  Error     VMProvisionFailed            35s    1 of 2 completed                                               
    └─Machine/proxmox-quickstart-workers-ft5tv-gjnmf         False  Info      PoweringOn                   5m14s  1 of 2 completed

prometherion avatar Nov 09 '24 16:11 prometherion