metrics-server
metrics-server copied to clipboard
OOM metrics undetected
What happened: I wanted to deliberately create a pod that would go "out of memory" but it seems to run fine.
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch deployment metrics-server -n kube-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"metrics-server","args":["--cert-dir=/tmp", "--secure-port=4443", "--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP"]}]}}}}'
apply the following deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: oomkilled
spec:
replicas: 1
selector:
matchLabels:
app: oomkilled
template:
metadata:
labels:
app: oomkilled
spec:
containers:
- image: gcr.io/google-containers/stress:v1
name: stress
command: [ "/stress"]
args:
- "--mem-total"
- "104858000"
- "--logtostderr"
- "--mem-alloc-size"
- "10000000"
resources:
requests:
memory: 1Mi
cpu: 5m
limits:
memory: 20Mi
What you expected to happen: the pod should switch to status "OOMKilled" right after starting, but instead it runs fine
Anything else we need to know?: I created a sister issue https://github.com/kubernetes-sigs/kind/issues/2848 which I should close soon.
Environment:
-
Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.): kind v0.14.0 go1.18.2 linux/amd64
-
Container Network Setup (flannel, calico, etc.):
-
Kubernetes version (use
kubectl version): +WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:30:46Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.0-alpha.0.881+7c127b33dafc53", GitCommit:"7c127b33dafc530f7ca0c165ddb47db86eb45880", GitTreeState:"clean", BuildDate:"2022-07-26T08:01:01Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} -
Metrics Server manifest
spoiler for Metrics Server manifest:
- Kubelet config:
spoiler for Kubelet config:
- Metrics server logs:
spoiler for Metrics Server logs:
I0801 14:51:58.441090 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) I0801 14:51:59.193821 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0801 14:51:59.193841 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I0801 14:51:59.193885 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" I0801 14:51:59.193914 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0801 14:51:59.193916 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" I0801 14:51:59.193930 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0801 14:51:59.194245 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key" I0801 14:51:59.194357 1 secure_serving.go:266] Serving securely on [::]:4443 I0801 14:51:59.194397 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" W0801 14:51:59.194531 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed I0801 14:51:59.294869 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0801 14:51:59.294910 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I0801 14:51:59.294938 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
- Status of Metrics API:
spolier for Status of Metrics API:
kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: k8s-app=metrics-server
Annotations:
/kind bug
Sorry. I don't understand how this issue is related to metrics-server.
Do you mean the pod uses more memory than the limit, the status should be OOMKilled?
This is not a function of metrics-server
/kind support /remove-kind bug
/cc @sanwishe See if it is related to kubelet?
@mikelo Could you please provide the actual utilization of memory of the stress container?
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/" | jq
{
"kind": "PodMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"metadata": {
"name": "oomkilled-85d9cf68b6-wfrt9",
"namespace": "default",
"creationTimestamp": "2022-08-02T15:16:21Z",
"labels": {
"app": "oomkilled",
"pod-template-hash": "85d9cf68b6"
}
},
"timestamp": "2022-08-02T15:16:00Z",
"window": "15.953s",
"containers": [
{
"name": "stress",
"usage": {
"cpu": "0",
"memory": "19732Ki"
}
}
]
}
]
}
I think the memory usage counted here is about 19M, so there is no oom. By the way, this metric is not associated with metrics-server.
Hi @mikelo!
I wanted to deliberately create a pod that would go "out of memory"
I used a classical example from the kubernetes docs, ran it both on minikube and on kind, and I reproduced your issue.
While minikube shows OOMKilled status, kind somehow gets the pod in the Running state.
1/ minikube - oomkilled as expected
$ minikube start --driver=kvm2
😄 minikube v1.25.2 on Fedora 36
- snip -
$ kubectl create namespace mem-example
namespace/mem-example created
$ kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit-2.yaml --namespace=mem-example
pod/memory-demo-2 created
$ kubectl get pods -n mem-example
NAME READY STATUS RESTARTS AGE
memory-demo-2 0/1 OOMKilled 2 (23s ago) 30s
2/ kind - the pod is running
$ kind create cluster
enabling experimental podman provider
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.24.0) 🖼
- snip -
$ kubectl create namespace mem-example
namespace/mem-example created
$ kubectl apply -f https://k8s.io/examples/pods/resource/memory-request-limit-2.yaml --namespace=mem-example
pod/memory-demo-2 created
$ kubectl get pods -n mem-example
NAME READY STATUS RESTARTS AGE
memory-demo-2 1/1 Running 0 12s
I believe that the issue is not related to metrics-server but probably related to kind, so you could close this issue and continue the discussion in the kind repository.
yes I agree, but in theory the application should take up 104M and hence go OOM... if this is kind issue I should close this one
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.