autoscaler
autoscaler copied to clipboard
VPA doesn't provide any recommendations when Pod is in OOMKill CrashLoopBackoff right after start
Which component are you using?: vertical-pod-autoscaler
What version of the component are you using?:
Component version: 0.10.0
What k8s version are you using (kubectl version)?:
kubectl version Output
$ kubectl version --short Client Version: v1.24.2 Kustomize Version: v4.5.4 Server Version: v1.23.4
What environment is this in?:
What did you expect to happen?: VPA should be able to help with Pods which are in an OOMKill CrashLoopBackOff and raise Limits/Requests until the workload is running.
What happened instead?: VPA did not give a single Recommendation for a Pod that right from the start goes into an OOMKill CrashLoopBackOff
How to reproduce it (as minimally and precisely as possible): Create a deployment that will be OOMKilled right after starting
apiVersion: apps/v1
kind: Deployment
metadata:
name: oomkilled
spec:
replicas: 1
selector:
matchLabels:
app: oomkilled
template:
metadata:
labels:
app: oomkilled
spec:
containers:
- image: gcr.io/google-containers/stress:v1
name: stress
command: [ "/stress"]
args:
- "--mem-total"
- "104858000"
- "--logtostderr"
- "--mem-alloc-size"
- "10000000"
resources:
requests:
memory: 1Mi
cpu: 5m
limits:
memory: 20Mi
Look at the container
(...)
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 20 Jun 2022 16:56:47 +0200
Finished: Mon, 20 Jun 2022 16:56:48 +0200
Ready: False
Restart Count: 5
(...)
Create a VPA object for this deployment
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
name: oomkilled-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: oomkilled
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 5m
memory: 10Mi
maxAllowed:
cpu: 1
memory: 5Gi
controlledResources: ["cpu", "memory"]
VPA does observe the corresponding OOMKill events in the Recommender logs
I0620 14:55:04.340502 1 cluster_feeder.go:465] OOM detected {Timestamp:2022-06-20 14:53:52 +0000 UTC Memory:1048576 ContainerID:{PodID:{Namespace:default PodName:oomkilled-6868f896d6-6vfqm} ContainerName:stress}}
I0620 14:55:04.340545 1 cluster_feeder.go:465] OOM detected {Timestamp:2022-06-20 14:54:08 +0000 UTC Memory:1048576 ContainerID:{PodID:{Namespace:default PodName:oomkilled-6868f896d6-6vfqm} ContainerName:stress}}
VPA Status is empty
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"autoscaling.k8s.io/v1","kind":"VerticalPodAutoscaler","metadata":{"annotations":{},"name":"oomkilled-vpa","namespace":"default"},"spec":{"resourcePolicy":{"containerPolicies":[{"containerName":"*","controlledResources":["cpu","memory"],"maxAllowed":{"cpu":1,
"memory":"5Gi"},"minAllowed":{"cpu":"5m","memory":"10Mi"}}]},"targetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"oomkilled"}}}
creationTimestamp: "2022-06-20T14:54:16Z"
generation: 2
name: oomkilled-vpa
namespace: default
resourceVersion: "299374"
uid: f47d84a8-aa6e-4042-b0a4-723888720a9d
spec:
resourcePolicy:
containerPolicies:
- containerName: '*'
controlledResources:
- cpu
- memory
maxAllowed:
cpu: 1
memory: 5Gi
minAllowed:
cpu: 5m
memory: 10Mi
targetRef:
apiVersion: apps/v1
kind: Deployment
name: oomkilled
updatePolicy:
updateMode: Auto
status:
conditions:
- lastTransitionTime: "2022-06-20T14:55:04Z"
status: "False"
type: RecommendationProvided
recommendation: {}
VPACheckpoint doesn't record any measurements
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscalerCheckpoint
metadata:
creationTimestamp: "2022-06-20T14:55:04Z"
generation: 24
name: oomkilled-vpa-stress
namespace: default
resourceVersion: "304997"
uid: 127a6331-7d1d-4ea6-b56a-63db3ee07a51
spec:
containerName: stress
vpaObjectName: oomkilled-vpa
status:
cpuHistogram:
referenceTimestamp: null
firstSampleStart: null
lastSampleStart: null
lastUpdateTime: "2022-06-20T15:18:04Z"
memoryHistogram:
referenceTimestamp: "2022-06-22T00:00:00Z"
version: v3
The Pod in CrashLoopBackOff doesn't have any PodMetrics, whereas other Pods do have metrics
k get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/" | jq
{
"kind": "PodMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"metadata": {
"name": "hamster-96d4585b7-b9tl9",
"namespace": "default",
"creationTimestamp": "2022-06-20T15:24:37Z",
"labels": {
"app": "hamster",
"pod-template-hash": "96d4585b7"
}
},
"timestamp": "2022-06-20T15:24:01Z",
"window": "56s",
"containers": [
{
"name": "hamster",
"usage": {
"cpu": "498501465n",
"memory": "512Ki"
}
}
]
},
{
"metadata": {
"name": "hamster-96d4585b7-c44j7",
"namespace": "default",
"creationTimestamp": "2022-06-20T15:24:37Z",
"labels": {
"app": "hamster",
"pod-template-hash": "96d4585b7"
}
},
"timestamp": "2022-06-20T15:24:04Z",
"window": "57s",
"containers": [
{
"name": "hamster",
"usage": {
"cpu": "501837091n",
"memory": "656Ki"
}
}
]
}
]
}
-
The above
Listcall is what the VPA Recommender uses to get metrics for all the Pods and then increases theTotalSamplesCountfor the individual Containers for every CPUSample in that List of Podmetrics. -
OOMKill events are recorded as MemorySamples, therefore, they also don't increase the
TotalSamplesCount. -
This container most likely doesn't get any recommendation, because its
TotalSamplesCountis0 -
Seems like others have seen this as well (and tried to resolve this by switching to a different metrics source): https://github.com/kubernetes-sigs/metrics-server/issues/976#issuecomment-1076102124
-
People really don't want metrics for terminated containers, these things were added intentionally:
- metrics-server intentionally doesn't provide PodMetrics for Pods with
terminatedcontainers – this resulted in Pods withinit-containersnot having any metrics when acAdvisorrefactoring included init containers in the kubelet summary API again [1, 2] - kubelet is meant to only provide metrics for non-terminated Pods and for
runningContainers
- metrics-server intentionally doesn't provide PodMetrics for Pods with
Anything else we need to know?:
On the same cluster, the hamster example works perfectly fine and gets recommendations as expected, so this is not a general issue with the VPA.
I just for fun applied this patch which increases the TotalSamplesCount when a memory sample (i.e. also an OOMKill sample) is added and afterwards the above Pod gets a recommendation and can run normally – as expected. I understand that the fix cannot be as simple as that, otherwise we would add two samples for every regular PodMetric (which contains both, CPU and memory), and existing implementations assume otherwise, I guess, but this is just to show that TotalSamplesCount seems to be the blocker in this situation.
diff --git a/vertical-pod-autoscaler/pkg/recommender/model/aggregate_container_state.go b/vertical-pod-autoscaler/pkg/recommender/model/aggregate_container_state.go
index 3facbe37e..7accd072e 100644
--- a/vertical-pod-autoscaler/pkg/recommender/model/aggregate_container_state.go
+++ b/vertical-pod-autoscaler/pkg/recommender/model/aggregate_container_state.go
@@ -184,6 +184,7 @@ func (a *AggregateContainerState) AddSample(sample *ContainerUsageSample) {
case ResourceCPU:
a.addCPUSample(sample)
case ResourceMemory:
+ a.TotalSamplesCount++
a.AggregateMemoryPeaks.AddSample(BytesFromMemoryAmount(sample.Usage), 1.0, sample.MeasureStart)
default:
panic(fmt.Sprintf("AddSample doesn't support resource '%s'", sample.Resource))
ping @jbartosik that's what I was mentioning in today's SIG call
maybe a.TotalSamplesCount++ should run when OOM is detected...
I think I saw this problem some tie ago. When I was implementing OOM tests for VPA.
Test didn't work if memory usage grew too quickly - pods were OOMing but VPA wasn't increasing its recommendation.
My plan is:
- Locally modify the e2e to grow memory usage very quickly, verify that VPA doesn't grow the recommendation,
- Add logging to VPA recommender to see if it's getting information about OOMs (I think here)
- If we get information but it doesn't affect recommendation then debug why (I think this is the most likely case),
- If we don't get the information read up / ask about how we could get it,
- If the test passes even when it grows memory usage very quickly then figure out how it's different from your situation.
I'll be away for next 2 weeks. I'll only be able to start doing this when I'm back
I think I saw this problem some tie ago. When I was implementing OOM tests for VPA.
Test didn't work if memory usage grew too quickly - pods were OOMing but VPA wasn't increasing its recommendation.
My plan is:
- Locally modify the e2e to grow memory usage very quickly, verify that VPA doesn't grow the recommendation,
- Add logging to VPA recommender to see if it's getting information about OOMs (I think here)
- If we get information but it doesn't affect recommendation then debug why (I think this is the most likely case),
- If we don't get the information read up / ask about how we could get it,
- If the test passes even when it grows memory usage very quickly then figure out how it's different from your situation.
I'll be away for next 2 weeks. I'll only be able to start doing this when I'm back
Ah, it's good to hear you already saw something similar!
My plan is:
Locally modify the e2e to grow memory usage very quickly, verify that VPA doesn't grow the recommendation, Add logging to VPA recommender to see if it's getting information about OOMs (I think here) If we get information but it doesn't affect recommendation then debug why (I think this is the most likely case), If we don't get the information read up / ask about how we could get it, If the test passes even when it grows memory usage very quickly then figure out how it's different from your situation.
I'll be away for next 2 weeks. I'll only be able to start doing this when I'm back
I can also take some time to do this – I don't think the scenario should be too far away from my repro case above. The modifications to the existing OOMObserver makes sense to verify that the correct information is really there – in my repro case above I thought seeing the logs here was sufficient evidence that the VPA sees the OOM events with the right amount of memory, and that adding a TotalSampleCount++ lead to getting the correct recommendation showed that the information in the OOM events was as expected.
Adapted the existing OOMKill test, such that the Pods run more quickly into OOMKills and eventually end in a CrashLoopBackOff: https://github.com/kubernetes/autoscaler/pull/5028
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/remove-lifecycle stale