karpenter
karpenter copied to clipboard
Mega Issue: Karpenter Observability (metrics, logs, eventing, etc.)
Description
What problem are you trying to solve?
As part of the journey to v1, I'd like us to consider the wholistic story of what we are doing with our metrics, logging, eventing, and status fields (status conditions, etc.) across the codebase. Right now, we have been adding in metrics, logging, and eventing piece-meal, but we haven't had a wholistic review over the whole story or given recommendations around how users of Karpenter should be monitoring it and what they should be alerting on (outside of our Grafana dashboard in our documentation).
This issue is meant to be a mega-issue for capturing all of the the other issues in the repo that are considering changes or improvements to the current metrics and monitoring story:
- [x] https://github.com/kubernetes-sigs/karpenter/issues/493
- [ ] https://github.com/kubernetes-sigs/karpenter/issues/1042
- [x] https://github.com/kubernetes-sigs/karpenter/issues/689
- [x] https://github.com/kubernetes-sigs/karpenter/issues/694
- [x] https://github.com/kubernetes-sigs/karpenter/issues/371
- [ ] https://github.com/kubernetes-sigs/karpenter/issues/695
- [ ] https://github.com/kubernetes-sigs/karpenter/issues/705
- [x] https://github.com/kubernetes-sigs/karpenter/issues/707
- [x] https://github.com/kubernetes-sigs/karpenter/issues/724
- [ ] https://github.com/kubernetes-sigs/karpenter/issues/736
- [ ] https://github.com/kubernetes-sigs/karpenter/issues/854
- [ ] https://github.com/kubernetes-sigs/karpenter/issues/781
- [ ] https://github.com/kubernetes-sigs/karpenter/issues/611
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Also https://github.com/kubernetes-sigs/karpenter/issues/712
Surfacing total node count per NodePool - likely via NodePool.status - has been requested (cf. Workgroup Meeting 2024-05-09)
It would be really great if karpenter could expose total real time cluster cost like eks-node-viewer does.
Here in the cluster summary with cluster cost on the right:
44 nodes (902794m/1056270m) 85.5% cpu ██████████████████████████████████░░░░░░ $25.082/hour | $18,310.152/month
2,072 pods (44 pending 2,028 running 2,035 bound)
I am aware of karpenter_cloudprovider_instance_type_offering_price_estimate metric but I don't know how calculate cluster cost from that in Datadog.
metric but I don't know how calculate cluster cost from that in Datadog
@jan-ludvik I guess that would end up being a cost estimate for all of the nodes -- honestly, we could probably just expose a cost estimate through the node metrics based on the pricing that's given back through the GetInstanceTypes() call
@jan-ludvik Would you mind opening the cluster cost estimate one as a separate feature ask? I'm going to go ahead and close this one out because it's been a Mega Issue that's been open for a while and all the major things that were "closable" have been addressed now
/close
@jonathan-innis: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
@jonathan-innis I already had one - can we reopen that or should I make new? https://github.com/aws/karpenter-provider-aws/issues/6566