autoscaling issues

Results 89 autoscaling issues

Sort by recently updated

Epic: automated autoscaling release workflow

## Motivation A continuation of https://github.com/neondatabase/cloud/issues/9672 and #837 We now have the script, but this is not a final state. ## Requirements Ideally, we want to: 1. Have a button...

Omrigan

t/Epic

Bug: compute node's logs have two sets of mismatching timestamps

## Environment Production ## Steps to reproduce Open [logs](https://neonprod.grafana.net/d/IJSLHBOnk/neon-logs-compute-nodes-by-compute-id?orgId=1&var-datasource=grafanacloud-logs&var-compute=compute-solitary-dawn-88692613&var-search=&var-exclude=substing%20to%20exclude&from=now-24h&to=now) and find a line with `ts=..`, e. g. ``` 2024-01-16 00:01:32.140 ts=2024-01-16T00:01:28.288Z caller=proc.go:250 msg="Excluded databases" databases=[] ``` ## Expected result The...

Omrigan

t/bug

Epic: Support fixed bigger compute units

## Motivation We got a couple of high priority customer requests to support bigger compute units than we currently support (8 CUs). Supporting that would allow onboarding a class of...

stradig

t/Epic

Basic unit testing coverage for neonvm-controller

Optional for neondatabase/company_projects#187. ## Problem description / Motivation The neonvm-controller's reconcile functions are huge and complex, with the majority of testing coming from system-wide end-to-end tests. This leaves us over-exposed...

sharnoff

a/test

c/autoscaling/neonvm

Bug: scheduler has negative "buffer" value

## Environment Prod (occurred twice recently) ## Steps to reproduce Not yet clear. Here's an example: ``` {"level":"info","ts":1709922373.111944,"logger":"autoscale-scheduler","caller":"plugin/state.go:1379","msg":"Adding VM pod to node","action":"read cluster state","virtualmachine":{"namespace":"default","name":"compute-falling-cake-a6d84vya"},"pod":{"namespace":"default","name":"compute-falling-cake-a6d84vya-dv647"},"node":"i-0d216a75a106c181d.us-west-2.compute.internal","verdict":{"cpu":"pod = 0.25/0.25 (node 14.25 -> 14.5...

sharnoff

t/bug

c/autoscaling/scheduler

Bug: Scheduler metrics can show buffer when there isn't any

## Environment Prod (eu-central-1) ## Steps to reproduce Unknown ## Expected result The node metrics reported by the scheduler should always match its internal state. ## Actual result The scheduler...

sharnoff

t/bug

c/autoscaling/scheduler

Failed scheduling

Observed this when vm failed to start as Pod was assigned to node but kubelet prevents it to start ``` Events: Type Reason Age From Message ---- ------ ---- ----...

cicdteam

t/bug

Epic: Support ARM AWS instances

## Motivation ARM instances in AWS are in most cases cheaper than x86 instances. Thus to reduce COGS it makes sense to switch to ARM instances. There are a couple...

stradig

t/Epic

scheduler plugin should de-prioritize newer nodes

## Problem description / Motivation Currently the load on the scheduler is somewhat unusual: we have (usually) short (but uneven) lifetimes of computes, with varying external load producing regular usage...

sharnoff

t/feature

c/autoscaling/scheduler

linter for unused function return value

## Problem description / Motivation As an example, #807 was caused by failing to call `.Inc()` on the `prometheus.Counter` returned by `WithLabelValues(...)`. It'd be good to have automated checking of...

sharnoff

a/reliability

autoscaling
autoscaling copied to clipboard

Metadata

Epic: automated autoscaling release workflow

Bug: compute node's logs have two sets of mismatching timestamps

Epic: Support fixed bigger compute units

Basic unit testing coverage for neonvm-controller

Bug: scheduler has negative "buffer" value

Bug: Scheduler metrics can show buffer when there isn't any

Failed scheduling

Epic: Support ARM AWS instances

scheduler plugin should de-prioritize newer nodes

linter for unused function return value

← Metadata

Owner

Metadata

autoscaling autoscaling copied to clipboard

Metadata

← Metadata

Owner

Metadata

autoscaling
autoscaling copied to clipboard