java icon indicating copy to clipboard operation
java copied to clipboard

[Query] Client metrics/monitoring

Open weiqingy opened this issue 3 years ago • 7 comments
trafficstars

We have been developing our operators using the kubernetes java client, and want to define some metrics about the operators, e.g. health metric, lagging metric, etc.

Prometheus metrics are used by the kubernetes java client (example). I wonder if there are any metrics defined by the client already and we can leverage directly (e.g. the requests in the work queue, etc.) ?

Thanks! Weiqing

weiqingy avatar Apr 13 '22 07:04 weiqingy

For the controller-runtime in go, seems they have the default exported metrics: https://github.com/kubernetes-sigs/kubebuilder/blob/c0a0bb6dc05e04332cb7b460bef5f155b53770cc/docs/book/src/reference/metrics-reference.md

I wonder if the java client has the default exported metrics as well?

weiqingy avatar Apr 15 '22 21:04 weiqingy

All of the metrics that we have are defined here:

https://github.com/kubernetes-client/java/blob/master/util/src/main/java/io/kubernetes/client/monitoring/PrometheusInterceptor.java

I think that we would be open to adding more.

brendandburns avatar Apr 18 '22 21:04 brendandburns

Thanks for the reply @brendandburns ! The metrics described https://github.com/kubernetes-sigs/kubebuilder/blob/c0a0bb6dc05e04332cb7b460bef5f155b53770cc/docs/book/src/reference/metrics-reference.md are useful. I wonder if they can be added to the Java client?

weiqingy avatar Apr 21 '22 05:04 weiqingy

Hi @brendandburns ,

In https://github.com/kubernetes-client/java/blob/master/util/src/main/java/io/kubernetes/client/monitoring/PrometheusInterceptor.java, seems the following metrics are defined:

k8s_java_requests_total
k8s_java_response_code_total
k8s_java_resource_request_latency_seconds
k8s_java__non_resource_request_latency_seconds

Besides the message put in the function help(), is there a wiki/doc about more details of the metric definition? E.g. Is "k8s_java_requests_total" the total count of requests in the work queue? or it's the total request count the controller has been reconciled so far? For k8s_java_resource_request_latency_seconds, is it defined as the time from when a user change is received to when it propagates out of the controller? What is the non_resource_request? What is response code total?

Thanks!

weiqingy avatar May 12 '22 06:05 weiqingy

Hey @brendandburns and @yue9944882,

Could you please help take a look at the questions above when you get a chance? Really appreciate.

Thanks, Weiqing

weiqingy avatar Jun 13 '22 20:06 weiqingy

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 11 '22 21:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 11 '22 21:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Nov 10 '22 21:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 10 '22 21:11 k8s-ci-robot

@weiqingy were you able to find these answers? and use the metrics

prash-kr-meena avatar Apr 21 '23 09:04 prash-kr-meena