clickhouse-docs
clickhouse-docs copied to clipboard
Update scaling public docs to expose behavior and thresholds
Currently scaling public docs don't explain autoscaling behavior fully, especially what thresholds we use to scale based on cpu and memory usage. We should update the docs to include the following details:
- CPU based autoscaling: We scale up (double cpu allocation) if cpu usage crosses an upper threshold in the range of 50-75% (actual threshold depends on the size of the cluster). If cpu usage falls below ½ of the lower threshold (say 25% in case of 50% upper threshold), we recommend downscaling the service and halve cpu allocation.
- Memory-based auto scaling: For memory usage, we recommend scaling to 125% of the maximum memory usage, or up to 150% if we encounter OOMs (out of memory errors).
- Lookback window: We look at data over the past 30 hours to make scaling decisions.
NOTE: We are also working on improving these thresholds and scale down windows which will happen with MBB and other work in progress.