cluster-api-provider-proxmox icon indicating copy to clipboard operation
cluster-api-provider-proxmox copied to clipboard

Towards Beta

Open simplysoft opened this issue 1 year ago • 4 comments

Please find a draft PR of our current variation/extension of this provider. With those changes we are closer to be able to consider using it outside of just experimental setups:

  • Run CAPPX with non-priviledged PVE user / permissions
    • Workaround proxmox shortcomings (VNC shell login, storage folder permissions)
    • Add VMs to proxmox resource pool (CAPPX only having permission to manage VMs on that resource pool)
  • Reworked storage approach to use import-from to allow use different storage for VMs (related to #93)
  • Control over VM ids & node placement
    • Configure nodes on ProxmoxCluster and ProxmoxMachineTemplate level
    • Initial support for CAPI Failure Domains (Node as Failure Domain Strategy)
    • Configure VM id ranges
  • Network Config:
    • Configurable bridge & vlan tag
    • Initial support for CAPI IPAM (#4)
  • Various smaller fixes / changes

We are currently testing with Cluster API Provider RKE2, so have some example for #68

Sample using new features & RKE2
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  name: ${CLUSTER_NAME}
  namespace: ${NAMESPACE}
spec:
  controlPlaneEndpoint:
    host: ${CONTROLPLANE_ADDRESS}
    port: 6443
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: RKE2ControlPlane
    name: ${CLUSTER_NAME}
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: ProxmoxCluster
    name: ${CLUSTER_NAME}
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: ProxmoxCluster
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  name: ${CLUSTER_NAME}
  namespace: ${NAMESPACE}
spec:
  controlPlaneEndpoint:
    host: ${CONTROLPLANE_ADDRESS}
    port: 6443
  serverRef:
    endpoint: https://${PROXMOX_ADDRESS}:8006/api2/json
    secretRef:
      name: ${CLUSTER_NAME}
      namespace: ${NAMESPACE}
  resourcePool: capi-proxmox
  failureDomain:
    nodeAsFailureDomain: true
  nodes:
    - node1
    - node2
    - node3

---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
kind: RKE2ControlPlane
metadata:
  name: ${CLUSTER_NAME}
  namespace: ${NAMESPACE}
spec:
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: ProxmoxMachineTemplate
    name: ${CLUSTER_NAME}-controlplane
  replicas: 3
  agentConfig:
    version: ${K8S_VERSION}+rke2r1
    kubelet:
      extraArgs:
        - 'cloud-provider=external'
  nodeDrainTimeout: 2m
  registrationMethod: "address"
  registrationAddress: "${CONTROLPLANE_ADDRESS}"
  postRKE2Commands:
    - curl https://kube-vip.io/manifests/rbac.yaml > /var/lib/rancher/rke2/server/manifests/kube-vip-rbac.yaml
    - /var/lib/rancher/rke2/bin/crictl -r "unix:///run/k3s/containerd/containerd.sock" pull ghcr.io/kube-vip/kube-vip:latest
    - CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock /var/lib/rancher/rke2/bin/ctr -n k8s.io run --rm --net-host ghcr.io/kube-vip/kube-vip:latest vip /kube-vip manifest daemonset --arp --address ${CONTROLPLANE_ADDRESS} --controlplane --leaderElection --taint --services --inCluster | tee /var/lib/rancher/rke2/server/manifests/kube-vip.yaml
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: ProxmoxMachineTemplate
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  name: ${CLUSTER_NAME}-controlplane
  namespace: ${NAMESPACE}
spec:
  vmIDs:
    start: 1000
    end: 1010
  template:
    spec:
      hardware:
        cpu: 4
        memory: 8192
        disk: 16G
        storage: ${PROXMOX_VM_STORAGE}
      image:
        checksum: ${CLOUD_IMAGE_HASH_SHA256}
        checksumType: sha256
        url: ${CLOUD_IMAGE_URL}
      network:
        bridge: ${VM_BRIDGE}
        vlanTag: ${VM_VLAN}
        nameServer: ${IPV4_NAMESEVER}
        ipConfig:
          IPv4FromPoolRef:
            name: ${IPV4_POOL_NAME}
            apiGroup: ipam.cluster.x-k8s.io
            kind: InClusterIPPool
      options:
        onBoot: true
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  name: ${CLUSTER_NAME}-md-0
  namespace: ${NAMESPACE}
spec:
  clusterName: ${CLUSTER_NAME}
  replicas: 3
  selector:
    matchLabels: {}
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha1
          kind: RKE2ConfigTemplate
          name: ${CLUSTER_NAME}-md-0
      clusterName: ${CLUSTER_NAME}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: ProxmoxMachineTemplate
        name: ${CLUSTER_NAME}-md-0
      version: ${K8S_VERSION}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha1
kind: RKE2ConfigTemplate
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  name: ${CLUSTER_NAME}-md-0
  namespace: ${NAMESPACE}
spec:
  template:
    spec:
#      preRKE2Commands:
#        - sleep 30 # fix to give OS time to become ready
      agentConfig:
        version: ${K8S_VERSION}+rke2r1
        kubelet:
          extraArgs:
            - "cloud-provider=external"
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: ProxmoxMachineTemplate
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  name: ${CLUSTER_NAME}-md-0
  namespace: ${NAMESPACE}
spec:
  nodes:
    - node1
    - node2
  vmIDs:
    start: 1010
    end: 1019
  template:
    spec:
      hardware:
        cpu: 4
        memory: 8192
        disk: 16G
        storage: ${PROXMOX_VM_STORAGE}
      image:
        checksum: ${CLOUD_IMAGE_HASH_SHA256}
        checksumType: sha256
        url: ${CLOUD_IMAGE_URL}
      network:
        bridge: ${VM_BRIDGE}
        vlanTag: ${VM_VLAN}
        nameServer: ${IPV4_NAMESEVER}
        ipConfig:
          IPv4FromPoolRef:
            name: ${IPV4_POOL_NAME}
            apiGroup: ipam.cluster.x-k8s.io
            kind: InClusterIPPool
      options:
        onBoot: true
---
apiVersion: v1
kind: Secret
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  name: ${CLUSTER_NAME}
  namespace: ${NAMESPACE}
stringData:
  PROXMOX_PASSWORD: "${PROXMOX_PASSWORD}"
  PROXMOX_SECRET: ""
  PROXMOX_TOKENID: ""
  PROXMOX_USER: "${PROXMOX_USER}"
type: Opaque
---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  name: ${CLUSTER_NAME}-crs-0
  namespace: ${NAMESPACE}
spec:
  clusterSelector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
  resources:
    - kind: ConfigMap
      name: ${CLUSTER_NAME}-cloud-controller-manager
  strategy: Reconcile
---
apiVersion: v1
data:
  cloud-controller-manager.yaml: |
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: proxmox-cloud-controller-manager
      namespace: kube-system
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: system:proxmox-cloud-controller-manager
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: cluster-admin
    subjects:
      - kind: ServiceAccount
        name: proxmox-cloud-controller-manager
        namespace: kube-system
    ---
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      labels:
        k8s-app: cloud-controller-manager
      name: cloud-controller-manager
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          k8s-app: cloud-controller-manager
      template:
        metadata:
          labels:
            k8s-app: cloud-controller-manager
        spec:
          serviceAccountName: proxmox-cloud-controller-manager
          containers:
            - name: cloud-controller-manager
              image: ghcr.io/sp-yduck/cloud-provider-proxmox:latest
              command:
                - /usr/local/bin/cloud-controller-manager
                - --cloud-provider=proxmox
                - --cloud-config=/etc/proxmox/config.yaml
                - --leader-elect=true
                - --use-service-account-credentials
                - --controllers=cloud-node,cloud-node-lifecycle
              volumeMounts:
                - name: cloud-config
                  mountPath: /etc/proxmox
                  readOnly: true
              livenessProbe:
                httpGet:
                  path: /healthz
                  port: 10258
                  scheme: HTTPS
                initialDelaySeconds: 20
                periodSeconds: 30
                timeoutSeconds: 5
          volumes:
            - name: cloud-config
              secret:
                secretName: cloud-config
          tolerations:
            - key: node.cloudprovider.kubernetes.io/uninitialized
              value: "true"
              effect: NoSchedule
            - key: node-role.kubernetes.io/control-plane
              operator: Exists
              effect: NoSchedule
            - key: node-role.kubernetes.io/master
              operator: Exists
              effect: NoSchedule
          nodeSelector:
            node-role.kubernetes.io/control-plane: "true"
    ---
    apiVersion: v1
    kind: Secret
    metadata:
      name: cloud-config
      namespace: kube-system
    stringData:
      config.yaml: |
        proxmox:
          url: https://${PROXMOX_ADDRESS}:8006/api2/json
          user: ""
          password: ""
          tokenID: "${PROXMOX_CCM_TOKENID}"
          secret: "${PROXMOX_CCM_SECRET}"
kind: ConfigMap
metadata:
  name: ${CLUSTER_NAME}-cloud-controller-manager
  namespace: ${NAMESPACE}
 

Beyond those current changes, based on our assessment, the existing CRD would further require some bigger refactoring / changes to enable other (for us required) use cases, such as support for multiple network devices & disks. Those changes will definitely be breaking changes.

Any suggestions & feedback welcome

simplysoft avatar Oct 13 '23 15:10 simplysoft