gaffer-docker
gaffer-docker copied to clipboard
Add autoscaling to Helm charts
Gaffer should scale up when hdfs starts filling up and down when it becomes under-utilised to allow. A nice to have feature would be to be able to do this on a schedule as well - so Gaffer can in effect shut down at night to reduce resources and costs when deployed on a Kubernetes cluster.
Scaling Accumulo's tablet servers should be easy enough, as they are pretty much stateless. However, just want to add a note of caution about scaling the HDFS data nodes. Adding additional data nodes is straight forward, however removing them is more complicated. With the default way we are running HDFS (replication factor 3, with no topology config), we will only be able to remove 2 data nodes at a time, otherwise you are pretty much guaranteed to lose data. You then need to wait for blocks to be fully replicated across the remaining nodes before being able to remove any more.
Specifying a topology should help, as it would allow us to remove any number of data nodes from 2 of the 3 zones at once. However, we currently rely on the StatefulSet operator to add/remove/replace data node Pods. Afaik it doesn't give you control over which exact data node Pod gets removed. I think it's only LIFO. We might need to look into creating our own operator, if we can't find something in the open-source community.