pangeo-cloud-federation
pangeo-cloud-federation copied to clipboard
cluster autoscaler was broken on AWS
tried to log into https://aws-uswest2.pangeo.io today but was not getting a server. turns out our core node (running on spot) was shut down and when the cluster-autoscaler pod was recreated it was crashing due to https://github.com/kubernetes/autoscaler/issues/3506.
just wanted to document here another case of things running on k8s will eventually break due to version incompatibilities or other things. The fix was pretty easy, and configuration files are here https://github.com/pangeo-data/pangeo-cloud-federation/pull/968.
Just had to give the autoscaler pod more memory, so we went from:
- image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.5
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
to:
- image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.7
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 600Mi
requests:
cpu: 100m
memory: 600Mi