autoscaling
autoscaling copied to clipboard
Postgres vertical autoscaling in k8s
First PR, introducing unit tests. The old functional tests are temporarily disabled. Part of #763
taskgroup.Group is the better version of errgroup.Group with two changes: 1. Support for multerrors 2. Logging the errors, when they are returned Part of #921
Probably best to review each commit separately. Most of the commits here are just repositioning to make the final commit, which adds LFC metrics collection, as simple as possible. ~~Builds...
## Motivation Since the pod lifetime can diverge from VM lifetime, we have to either choose one of: - At least one pod per VM - At most one pod...
Trying out this approach as a potential way to make implementing overcommit (#517) easier, to avoid dealing with rounding errors when responding to the autoscaler-agents. I think this "speculative reserve"...
Resolves #517. (and a few other commits tossed in there as well) First two commits are refactorings to minimize the diff of the third commit, which is the actual overcommitting...
## Problem description / Motivation We don't have clear signals for when the scheduler denies upscaling, or the vm-monitor denies downscaling. This came up here: https://neondb.slack.com/archives/C03TN5G758R/p1716167732610919?thread_ts=1716166712.233079 ## Feature idea(s) /...
The following error message is observed: `cannot create /dev/virtio-ports/neon.tech.log.0: Device or resource busy`. This has happened together with memory pressure, and OOM-kill of compute_ctl. The leading hypothesis on this is:...
This might help to debug the issues when we have a lot of VM failing to reconcile. Although, it is unclear if repeated conflicts for the same VM is likely...