beats
beats copied to clipboard
Add kubernetes.pod.status_reason and kubernetes.pod.status.ready_time fields in Kubernetes state_pod metricset
Proposed commit message
- WHAT: Enhance kubernetes state_pod metricset with
kubernetes.pod.status_reason
andkubernetes.pod.status.ready_time
fields. - WHY: Useful new metrics that indicate the reason a pod might not be in a desired state and the time it took for a pod to become ready.
Checklist
- [x] My code follows the style guidelines of this project
- [ ] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] I have made corresponding change to the default configuration files
- [ ] I have added tests that prove my fix is effective or that my feature works
- [x] I have added an entry in
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.
Author's Checklist
- [ ]
How to test this PR locally
- create local k8s cluster:
kind create cluster
- install stack:
elastic-package-0.98.2 stack up -d -v --version=8.14.0-SNAPSHOT
- edit dev-tools/kubernetes/Tiltfile to run in
mode="run"
and run:
cd dev-tools/kubernetes
tilt up
- Create a pod with a low resources.limits.memory to cause OOMKilled
Related issues
- Relates https://github.com/elastic/beats/issues/39158
- Relates https://github.com/elastic/integrations/issues/9752
Use cases
Screenshots
Logs
Note
I did not manage to find any way that the kube_pod_status_reason
has any other value than zero. The possible reasons of getting a value of 1 are Evicted, NodeAffinity, NodeLost, Shutdown, UnexpectedAdmissionError
.
I tried to create such a situation but in all cases the pod failing reason have been either Error
or Unschedulable
.
These specific reasons are not collected by kube-state-metrics.
The problem though is not related to kube-state-metrics, rather to Kubernetes. When a pod gets evicted Kubernetes should add status reason to Evicted
but that does not happen.
Anyway, I recommend we introduce the new fields as there may be cases that the status reason gets one of the expected values that I cannot reproduce in a local non production cluster.
This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @MichaelKatsoulis? 🙏. For such, you'll need to label your PR with:
- The upcoming major version of the Elastic Stack
- The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)
To fixup this pull request, you need to add the backport labels for the needed branches, such as:
-
backport-v8./d.0
is the label to automatically backport to the8./d
branch./d
is the digit
:green_heart: Build Succeeded
the below badges are clickable and redirect to their specific view in the CI or DOCS
![]()
![]()
![]()
![]()
![]()
Expand to view the summary
Build stats
-
Start Time: 2024-05-08T08:14:46.589+0000
-
Duration: 100 min 48 sec
Test stats :test_tube:
Test | Results |
---|---|
Failed | 0 |
Passed | 4618 |
Skipped | 904 |
Total | 5522 |
:green_heart: Flaky test report
Tests succeeded.
:robot: GitHub comments
Expand to view the GitHub comments
To re-run your PR in the CI, just comment with:
-
/test
: Re-trigger the build. -
/package
: Generate the packages and run the E2E tests. -
/beats-tester
: Run the installation tests with beats-tester. -
run
elasticsearch-ci/docs
: Re-trigger the docs validation. (use unformatted text in the comment!)