karpenter icon indicating copy to clipboard operation
karpenter copied to clipboard

Mega Issue: Karpenter Observability (metrics, logs, eventing, etc.)

Open jonathan-innis opened this issue 1 year ago • 3 comments

Description

What problem are you trying to solve?

As part of the journey to v1, I'd like us to consider the wholistic story of what we are doing with our metrics, logging, eventing, and status fields (status conditions, etc.) across the codebase. Right now, we have been adding in metrics, logging, and eventing piece-meal, but we haven't had a wholistic review over the whole story or given recommendations around how users of Karpenter should be monitoring it and what they should be alerting on (outside of our Grafana dashboard in our documentation).

This issue is meant to be a mega-issue for capturing all of the the other issues in the repo that are considering changes or improvements to the current metrics and monitoring story:

  • [x] https://github.com/kubernetes-sigs/karpenter/issues/493
  • [ ] https://github.com/kubernetes-sigs/karpenter/issues/1042
  • [x] https://github.com/kubernetes-sigs/karpenter/issues/689
  • [x] https://github.com/kubernetes-sigs/karpenter/issues/694
  • [x] https://github.com/kubernetes-sigs/karpenter/issues/371
  • [ ] https://github.com/kubernetes-sigs/karpenter/issues/695
  • [ ] https://github.com/kubernetes-sigs/karpenter/issues/705
  • [x] https://github.com/kubernetes-sigs/karpenter/issues/707
  • [x] https://github.com/kubernetes-sigs/karpenter/issues/724
  • [ ] https://github.com/kubernetes-sigs/karpenter/issues/736
  • [ ] https://github.com/kubernetes-sigs/karpenter/issues/854
  • [ ] https://github.com/kubernetes-sigs/karpenter/issues/781
  • [ ] https://github.com/kubernetes-sigs/karpenter/issues/611
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

jonathan-innis avatar Feb 28 '24 23:02 jonathan-innis

Also https://github.com/kubernetes-sigs/karpenter/issues/712

Bryce-Soghigian avatar Mar 25 '24 16:03 Bryce-Soghigian

Surfacing total node count per NodePool - likely via NodePool.status - has been requested (cf. Workgroup Meeting 2024-05-09)

tallaxes avatar May 09 '24 21:05 tallaxes

It would be really great if karpenter could expose total real time cluster cost like eks-node-viewer does. Here in the cluster summary with cluster cost on the right:

44 nodes (902794m/1056270m) 85.5% cpu ██████████████████████████████████░░░░░░ $25.082/hour | $18,310.152/month 
2,072 pods (44 pending 2,028 running 2,035 bound)

I am aware of karpenter_cloudprovider_instance_type_offering_price_estimate metric but I don't know how calculate cluster cost from that in Datadog.

jan-ludvik avatar Aug 05 '24 08:08 jan-ludvik

metric but I don't know how calculate cluster cost from that in Datadog

@jan-ludvik I guess that would end up being a cost estimate for all of the nodes -- honestly, we could probably just expose a cost estimate through the node metrics based on the pricing that's given back through the GetInstanceTypes() call

jonathan-innis avatar Feb 18 '25 14:02 jonathan-innis

@jan-ludvik Would you mind opening the cluster cost estimate one as a separate feature ask? I'm going to go ahead and close this one out because it's been a Mega Issue that's been open for a while and all the major things that were "closable" have been addressed now

jonathan-innis avatar Mar 05 '25 06:03 jonathan-innis

/close

jonathan-innis avatar Mar 05 '25 06:03 jonathan-innis

@jonathan-innis: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Mar 05 '25 06:03 k8s-ci-robot

@jonathan-innis I already had one - can we reopen that or should I make new? https://github.com/aws/karpenter-provider-aws/issues/6566

jan-ludvik avatar Mar 27 '25 12:03 jan-ludvik