ClusterFactory icon indicating copy to clipboard operation
ClusterFactory copied to clipboard

Research and implement a MVP for cluster autoscaling

Open Darkness4 opened this issue 3 years ago • 0 comments

Let's focus on Exoscale and a single availability group first.

There are multiple solutions, but this is my proposition. The idea is based on Slurm cloud-bursting. We need to solve two problems:

  1. How to spawn VM which will join a private network? The private repository with Ansible can help us.
  2. How to make the VM join the cluster? cfctl could be the answer.

Therefore, we should combine these two features. Because cfctl is similar to our Ansible.

  • [ ] Implement in cfctl, how to create nodes on Exoscale.
  • [ ] Implement a join mechanism. It's easy: healthcheck first with SSH, then use cfctl apply (or whatever).
  • [ ] Implement the gRPC API for Cluster Autoscaler. Ref : https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/externalgrpc/README.md . Do not ignore the considerations. Hetzner seems to be a good example of expected user experience.

To add more details, here is the join mechanism for k0s:

  1. On a controller node, call k0s token create --role=worker, this will create a join token. For extra security, the token must have an expiration time: k0s token create --role=worker --expiry=1h.
  2. On the new virtual machine, via SSH, install k0s and call sudo k0s install worker --token-file /path/to/token/file.
  3. Then start: sudo k0s start.

Ejecting a node is also easy: Cordon + Drain + Kubectl delete node. Then delete the VM.

This feature will certainly take time.

Darkness4 avatar Jun 26 '22 01:06 Darkness4