anthill icon indicating copy to clipboard operation
anthill copied to clipboard

Control size of cluster via thresholds & limits

Open JohnStrunk opened this issue 6 years ago • 0 comments

Describe the feature you'd like to have. Instead of having to set the size of the cluster manually (#11), it should be possible to have the cluster grow & shrink as necessary. This dynamic sizing should be subject to limits to contain costs.

What is the value to the end user? (why is it a priority?) With manual sizing, the admin must constantly monitor the cluster and vary the number of nodes as storage usage changes. This requires a good amount of knowledge and a willingness to probe into the cluster's state to track free space. Instead, admins should be able to provide a min and max size for the cluster that they are willing to have, and the operator should dynamically size the cluster, trading off cost (large cluster) with spare capacity that is available for new volume allocations.

How will we know we have a good solution? (acceptance criteria)

  • When unallocated capacity falls below a configurable threshold, the cluster should be expanded
  • When unallocated capacity exceeds a configurable threshold, the cluster should be contracted
  • There should be a configurable minimum and maximum number of nodes allowed
  • There should be a configurable maximum amount of capacity allowed
  • The operator should maintain a good distribution of nodes across the various fault domains (#13)

Work items

  • [ ] Operator monitors unallocated capacity
  • [ ] Operator uses monitored capacity to increase/decrease node count

Additional context We'll need to consider:

  • How to rate limit growth/shrink
  • How to handle allocations greater than cluster free space
  • How to intelligently spread nodes across domains (What it some are full and others not? How do we know?)

JohnStrunk avatar Jun 28 '18 20:06 JohnStrunk