beats Add kubernetes.pod.status_reason and kubernetes.pod.status.ready_time fields in Kubernetes state

Proposed commit message

WHAT: Enhance kubernetes state_pod metricset with kubernetes.pod.status_reason and kubernetes.pod.status.ready_time fields.
WHY: Useful new metrics that indicate the reason a pod might not be in a desired state and the time it took for a pod to become ready.

Checklist

[x] My code follows the style guidelines of this project
[ ] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] I have made corresponding change to the default configuration files
[ ] I have added tests that prove my fix is effective or that my feature works
[x] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

[ ]

How to test this PR locally

create local k8s cluster: kind create cluster
install stack: elastic-package-0.98.2 stack up -d -v --version=8.14.0-SNAPSHOT
edit dev-tools/kubernetes/Tiltfile to run in mode="run" and run:

cd dev-tools/kubernetes
tilt up

Create a pod with a low resources.limits.memory to cause OOMKilled

Related issues

Relates https://github.com/elastic/beats/issues/39158
Relates https://github.com/elastic/integrations/issues/9752

Use cases

Screenshots

ready time

Logs

Note

I did not manage to find any way that the kube_pod_status_reason has any other value than zero. The possible reasons of getting a value of 1 are Evicted, NodeAffinity, NodeLost, Shutdown, UnexpectedAdmissionError. I tried to create such a situation but in all cases the pod failing reason have been either Error or Unschedulable. These specific reasons are not collected by kube-state-metrics.

The problem though is not related to kube-state-metrics, rather to Kubernetes. When a pod gets evicted Kubernetes should add status reason to Evicted but that does not happen. Anyway, I recommend we introduce the new fields as there may be cases that the status reason gets one of the expected values that I cannot reproduce in a local non production cluster.

Apr 30 '24 12:04 MichaelKatsoulis

This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @MichaelKatsoulis? 🙏. For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

Apr 30 '24 12:04 mergify[bot]

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2024-05-08T08:14:46.589+0000
Duration: 100 min 48 sec

Test stats :test_tube:

Test	Results
Failed	0
Passed	4618
Skipped	904
Total	5522

:green_heart: Flaky test report

Tests succeeded.

:robot: GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Apr 30 '24 14:04 elasticmachine

beats
beats copied to clipboard

Add kubernetes.pod.status_reason and kubernetes.pod.status.ready_time fields in Kubernetes state_pod metricset

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Note

:green_heart: Build Succeeded

Build stats

Test stats :test_tube:

:green_heart: Flaky test report

:robot: GitHub comments

beats beats copied to clipboard

Add kubernetes.pod.status_reason and kubernetes.pod.status.ready_time fields in Kubernetes state_pod metricset

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Note

:green_heart: Build Succeeded

Build stats

Test stats :test_tube:

:green_heart: Flaky test report

:robot: GitHub comments

beats
beats copied to clipboard