cluster-api-provider-proxmox
cluster-api-provider-proxmox copied to clipboard
add MachineHealthCheck
Hi @sp-yduck,
this adds a MachineHealthCheck for all Machines of a new cluster. This can help if the node does not go into a running state. E.g. #145 or network problems during startup or other reasons.
Thank you for the PR !
As you may know this template file is used for Quick Start. I want to keep Quick Start with minimal setups. So if you want to include this, the options is
- create new template for it so that users can choose specific template to try specific features ref: https://cluster-api.sigs.k8s.io/clusterctl/commands/generate-cluster.html?highlight=flavor#flavors ref: https://cluster-api.sigs.k8s.io/clusterctl/commands/generate-cluster.html?highlight=flavor#alternative-source-for-cluster-templates
I also changed the location of the quick start i hope this is fine for you. I don't know how the clusterctl knows where to find the cluster-templates. Do i have to change something else to make this work?
clusterctl checks assets of the release so the file changes are ok. the thing is I am using make release to output these assets for each release.
-
make release-templates https://github.com/sp-yduck/cluster-api-provider-proxmox/blob/bbcdd56993d21f9e3581eed558806fc909b71cec/Makefile#L219-L220
-
make generate-e2e-templates https://github.com/sp-yduck/cluster-api-provider-proxmox/blob/bbcdd56993d21f9e3581eed558806fc909b71cec/Makefile#L111-L113
I think you can use kustomize build template/base or something for both of them
btw I believe mhc does not help for
E.g. https://github.com/sp-yduck/cluster-api-provider-proxmox/issues/145 or network problems during startup or other reasons.
since mhc checks Machine and Node object to confirm if Node(in workload cluster) is ready. so like issue #145 , if the vm goes into unhealthy before it joins k8s cluster, mhc cannot find the Node associated to that unhealthy vm and cannot remediate it.
I'm not sure about the detailed mechanics. But I can confirm that if a VM does not boot, it is deleted and recreated. (But at the moment no VM boots because i get the error every time 😢)
I think mhc also checks on default the status.condtion[].type.Ready field. Or that is the only way I can explain the behavior.