ClusterFactory
ClusterFactory copied to clipboard
Research and implement a MVP for cluster autoscaling
Let's focus on Exoscale and a single availability group first.
There are multiple solutions, but this is my proposition. The idea is based on Slurm cloud-bursting. We need to solve two problems:
- How to spawn VM which will join a private network? The private repository with Ansible can help us.
- How to make the VM join the cluster?
cfctlcould be the answer.
Therefore, we should combine these two features. Because cfctl is similar to our Ansible.
- [ ] Implement in
cfctl, how to create nodes on Exoscale. - [ ] Implement a join mechanism. It's easy: healthcheck first with SSH, then use
cfctl apply(or whatever). - [ ] Implement the gRPC API for Cluster Autoscaler. Ref : https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/externalgrpc/README.md . Do not ignore the considerations. Hetzner seems to be a good example of expected user experience.
To add more details, here is the join mechanism for k0s:
- On a controller node, call
k0s token create --role=worker, this will create a join token. For extra security, the token must have an expiration time:k0s token create --role=worker --expiry=1h. - On the new virtual machine, via SSH, install k0s and call
sudo k0s install worker --token-file /path/to/token/file. - Then start:
sudo k0s start.
Ejecting a node is also easy: Cordon + Drain + Kubectl delete node. Then delete the VM.
This feature will certainly take time.