gaffer-docker icon indicating copy to clipboard operation
gaffer-docker copied to clipboard

Add autoscaling to Helm charts

Open d47853 opened this issue 5 years ago • 1 comments

Gaffer should scale up when hdfs starts filling up and down when it becomes under-utilised to allow. A nice to have feature would be to be able to do this on a schedule as well - so Gaffer can in effect shut down at night to reduce resources and costs when deployed on a Kubernetes cluster.

d47853 avatar Jun 29 '20 09:06 d47853

Scaling Accumulo's tablet servers should be easy enough, as they are pretty much stateless. However, just want to add a note of caution about scaling the HDFS data nodes. Adding additional data nodes is straight forward, however removing them is more complicated. With the default way we are running HDFS (replication factor 3, with no topology config), we will only be able to remove 2 data nodes at a time, otherwise you are pretty much guaranteed to lose data. You then need to wait for blocks to be fully replicated across the remaining nodes before being able to remove any more.

Specifying a topology should help, as it would allow us to remove any number of data nodes from 2 of the 3 zones at once. However, we currently rely on the StatefulSet operator to add/remove/replace data node Pods. Afaik it doesn't give you control over which exact data node Pod gets removed. I think it's only LIFO. We might need to look into creating our own operator, if we can't find something in the open-source community.

ctas582 avatar Jun 29 '20 14:06 ctas582