autoscaling
autoscaling copied to clipboard
Bug: scheduler has negative "buffer" value
Environment
Prod (occurred twice recently)
Steps to reproduce
Not yet clear. Here's an example:
{"level":"info","ts":1709922373.111944,"logger":"autoscale-scheduler","caller":"plugin/state.go:1379","msg":"Adding VM pod to node","action":"read cluster state","virtualmachine":{"namespace":"default","name":"compute-falling-cake-a6d84vya"},"pod":{"namespace":"default","name":"compute-falling-cake-a6d84vya-dv647"},"node":"i-0d216a75a106c181d.us-west-2.compute.internal","verdict":{"cpu":"pod = 0.25/0.25 (node 14.25 -> 14.5 / 127.61, 0 -> 4.294967046e+06 buffer)","mem":"pod = 1Gi/1Gi (node 57Gi -> 58Gi / 519497968Ki, 0 -> -1Gi buffer"}}
I think it's entirely caused by faulty logic in (*AutoscaleEnforcer).readClusterState()
, but haven't looked into it thoroughly.
And tbh, it's a little weird that readClusterState
has its own implementation of reserve logic, rather than using the shared version that was added in #666.
Expected result
Any buffer value from adding a VM should be non-negative.
Actual result
The memory "buffer" value was negative (see: -1Gi buffer
), and the value for CPU underflowed.