kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

Metrics kube_pod_container_status_last_terminated_reason does not report the pod terminated situation

Open amber-yan opened this issue 3 years ago • 11 comments

I did a test, I updated the deployment(by kubectl edit deployment), I observed and I found the old pod is terminated, and a new pod is created.

I checked and I found kube_pod_container_status_last_terminated_reason did not report this.

amber-yan avatar Aug 12 '22 09:08 amber-yan

Was the pod deleted from the API server?

fpetkovski avatar Aug 16 '22 11:08 fpetkovski

Hi @fpetkovski How to check if the pod is deleted from the API server? I use the command kubectl get pods to check, the pod is missing in the result.

amber

amber-yan avatar Aug 17 '22 01:08 amber-yan

KSM can only report data for resources that exist in Kubernetes. If get pods does not show the pod, it means that it was removed from the API server and KSM has no way of knowing about it.

fpetkovski avatar Aug 17 '22 06:08 fpetkovski

If that is the case, Is there any way to get the metrics to know the pod is removed

amber-yan avatar Aug 17 '22 07:08 amber-yan

You can check the kube_pod_info metric and see when the pod disappeared.

fpetkovski avatar Aug 17 '22 07:08 fpetkovski

If I would like to get the all crash events, can I ignore the situation that the pod is removed? If the application crashes, the K8s will keep the pod and restart the container or terminate the existing pod and create a new one?

amber-yan avatar Aug 17 '22 08:08 amber-yan

It will recreate the crashed container and keep the existing pod

fpetkovski avatar Aug 17 '22 08:08 fpetkovski

Is there any way to get the terminated time from the metric kube_pod_container_status_last_terminated_reason? Thanks

amber-yan avatar Aug 18 '22 04:08 amber-yan

If you are scraping KSM, you will know when the pod went away by looking at when kube_pod_info for the pod stops being reported.

fpetkovski avatar Aug 18 '22 06:08 fpetkovski

Sorry, I don't describe my question clearly. I would like to get the crash situation, so I need to use the KSM kube_pod_container_status_last_terminated_reason to get the container termination reason and time. I tried the KSM kube_pod_container_status_last_terminated_reason, from that, I can only get the pod, container, hana-tooling etc information, and the value seems the termination conut. But the termination timestamp is not contained.

amber-yan avatar Aug 18 '22 08:08 amber-yan

Having the termination timestamp as a label would create extremely high cardinality, which is why it is not added to any of the metrics.

For this we would need a new metric kube_pod_container_status_last_terminated_timestamp that has the timestamp as a value, but I would make that a new issue that we can discuss separately.

fpetkovski avatar Aug 18 '22 08:08 fpetkovski

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 16 '22 09:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 16 '22 10:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 15 '23 11:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 15 '23 11:01 k8s-ci-robot

hey @fpetkovski I am also trying to achieve the same as described above:

I would like to get the crash situation, so I need to use the KSM kube_pod_container_status_last_terminated_reason to get the container termination reason and time.

And would be interested in metric as kube_pod_container_status_last_terminated_timestamp as you mentioned:

For this we would need a new metric kube_pod_container_status_last_terminated_timestamp that has the timestamp as a value, but I would make that a new issue that we can discuss separately.

should I first open an issue for that? or there was any similar discussion regarding that?

tetianakravchenko avatar Dec 27 '23 16:12 tetianakravchenko