autoscaling icon indicating copy to clipboard operation
autoscaling copied to clipboard

Postgres vertical autoscaling in k8s

Results 89 autoscaling issues
Sort by recently updated
recently updated
newest added

## Motivation A continuation of https://github.com/neondatabase/cloud/issues/9672 and #837 We now have the script, but this is not a final state. ## Requirements Ideally, we want to: 1. Have a button...

t/Epic

## Environment Production ## Steps to reproduce Open [logs](https://neonprod.grafana.net/d/IJSLHBOnk/neon-logs-compute-nodes-by-compute-id?orgId=1&var-datasource=grafanacloud-logs&var-compute=compute-solitary-dawn-88692613&var-search=&var-exclude=substing%20to%20exclude&from=now-24h&to=now) and find a line with `ts=..`, e. g. ``` 2024-01-16 00:01:32.140 ts=2024-01-16T00:01:28.288Z caller=proc.go:250 msg="Excluded databases" databases=[] ``` ## Expected result The...

t/bug

## Motivation We got a couple of high priority customer requests to support bigger compute units than we currently support (8 CUs). Supporting that would allow onboarding a class of...

t/Epic

Optional for neondatabase/company_projects#187. ## Problem description / Motivation The neonvm-controller's reconcile functions are huge and complex, with the majority of testing coming from system-wide end-to-end tests. This leaves us over-exposed...

a/test
c/autoscaling/neonvm

## Environment Prod (occurred twice recently) ## Steps to reproduce Not yet clear. Here's an example: ``` {"level":"info","ts":1709922373.111944,"logger":"autoscale-scheduler","caller":"plugin/state.go:1379","msg":"Adding VM pod to node","action":"read cluster state","virtualmachine":{"namespace":"default","name":"compute-falling-cake-a6d84vya"},"pod":{"namespace":"default","name":"compute-falling-cake-a6d84vya-dv647"},"node":"i-0d216a75a106c181d.us-west-2.compute.internal","verdict":{"cpu":"pod = 0.25/0.25 (node 14.25 -> 14.5...

t/bug
c/autoscaling/scheduler

## Environment Prod (eu-central-1) ## Steps to reproduce Unknown ## Expected result The node metrics reported by the scheduler should always match its internal state. ## Actual result The scheduler...

t/bug
c/autoscaling/scheduler

Observed this when vm failed to start as Pod was assigned to node but kubelet prevents it to start ``` Events: Type Reason Age From Message ---- ------ ---- ----...

t/bug

## Motivation ARM instances in AWS are in most cases cheaper than x86 instances. Thus to reduce COGS it makes sense to switch to ARM instances. There are a couple...

t/Epic

## Problem description / Motivation Currently the load on the scheduler is somewhat unusual: we have (usually) short (but uneven) lifetimes of computes, with varying external load producing regular usage...

t/feature
c/autoscaling/scheduler

## Problem description / Motivation As an example, #807 was caused by failing to call `.Inc()` on the `prometheus.Counter` returned by `WithLabelValues(...)`. It'd be good to have automated checking of...

a/reliability