autoscaling
autoscaling copied to clipboard
Bug: Scheduler metrics can show buffer when there isn't any
Environment
Prod (eu-central-1)
Steps to reproduce
Unknown
Expected result
The node metrics reported by the scheduler should always match its internal state.
Actual result
The scheduler showed an extended period of non-zero buffer CPU/memory, even though the state dump showed no nodes with non-zero buffer CPU or memory (gist; collected by ntk a scheduler-state
).
The scheduler was restarted at about 2023-12-05 23:57, at which point the issue was resolved.
Note that this was happening in at the same time as some other node-level disruptions on the same kubernetes node that the scheduler was on (slack link for more), so this there may have been some related impact there.
Other logs, links
- ...