karmada icon indicating copy to clipboard operation
karmada copied to clipboard

Expose default port for monitoring, in case of create PodMonitor or S…

Open wangxf1987 opened this issue 1 year ago • 21 comments

…erviceMonitor

What type of PR is this? This is improve for installation, the port of karmada-apiserver is exposed, but other componment is not.

What this PR does / why we need it:

Which issue(s) this PR fixes: Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Expose the default port  for the karmada-controller-manager, scheduler and agent when creating a PodMonitor.

wangxf1987 avatar Apr 28 '24 03:04 wangxf1987

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign poor12 after the PR has been reviewed. You can assign the PR to them by writing /assign @poor12 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot avatar Apr 28 '24 03:04 karmada-bot

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 53.07%. Comparing base (3232c52) to head (a4d9be4). Report is 13 commits behind head on master.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4877      +/-   ##
==========================================
+ Coverage   53.05%   53.07%   +0.01%     
==========================================
  Files         250      251       +1     
  Lines       20396    20389       -7     
==========================================
- Hits        10822    10821       -1     
+ Misses       8855     8854       -1     
+ Partials      719      714       -5     
Flag Coverage Δ
unittests 53.07% <ø> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Apr 28 '24 03:04 codecov-commenter

/assign @chaosi-zju

RainbowMango avatar Apr 28 '24 07:04 RainbowMango

Hi! @wangxf1987 Glad to see your contribution!

The CI of this PR failed due to it wasn't signed off, usually please use git commit -s -m 'your message ' or git commit -m ' Signed-off-by: AuthorName <[email protected]> \n <other message> ' to pass DCO.

Detail guideline can refer to: https://probot.github.io/apps/dco/

chaosi-zju avatar Apr 28 '24 07:04 chaosi-zju

Hi! @wangxf1987 Glad to see your contribution!

The CI of this PR failed due to it wasn't signed off, usually please use git commit -s -m 'your message ' or git commit -m ' Signed-off-by: AuthorName <[email protected]> \n <other message> ' to pass DCO.

Detail guideline can refer to: https://probot.github.io/apps/dco/

done

wangxf1987 avatar Apr 29 '24 11:04 wangxf1987

/lgtm

chaosi-zju avatar Apr 30 '24 01:04 chaosi-zju

Can you write the release-notes? @wangxf1987

chaunceyjiang avatar May 08 '24 07:05 chaunceyjiang

Can you write the release-notes? @wangxf1987 @chaunceyjiang Ok, i should modify the release notes for that version, is it https://github.com/karmada-io/karmada/blob/master/docs/CHANGELOG/CHANGELOG-0.10.md ?

wangxf1987 avatar May 09 '24 02:05 wangxf1987

image @wangxf1987 Here. In your PR description

chaunceyjiang avatar May 13 '24 02:05 chaunceyjiang

Expose the default port for the karmada-controller-manager, scheduler and agent when creating a PodMonitor.

done

wangxf1987 avatar May 13 '24 06:05 wangxf1987

Is this ServiceMonitor and PodMonitor?

By the way, this PR focuses on exposing ports in the helm chart template, what about operators and cli? Should we keep them consistent? cc @chaosi-zju

RainbowMango avatar May 14 '24 06:05 RainbowMango

Let me talk about my understanding of this PR:

First, our karmada-controller-manager does listened 8080 port as metrics querying API, we defined it in https://github.com/karmada-io/karmada/blob/e6c92d5cba0edfa81e920bfff98db1b4be257d82/cmd/controller-manager/app/controllermanager.go#L161

we can try as:

$ kubectl --context karmada-host exec -it karmada-controller-manager-cdbbf75dd-69fxb -n karmada-system -- sh
/ # curl http://127.0.0.1:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

even without this PR, we can still access /metrics API of karmada-controller-manager from other pod, just like:

$ kubectl --context karmada-host exec -it karmada-descheduler-7d654cbff6-5jmrt -n karmada-system -- sh

# here, `10.244.0.12` is the pod IP of `karmada-controller-manager`.
/ # curl http://10.244.0.12:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

So, the normal operation of the /metrics API does not originally rely on the content proposed by this PR.


However, @wangxf1987 has needs to use Prometheus to collect metrics, as described in this tutorial: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#using-podmonitors

take PodMonitor as an example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  podMetricsEndpoints:
  - port: web

In practice, the spec.selector label tells Prometheus which Pods should be scraped, while spec.podMetricsEndpoints defines an endpoint serving Prometheus metrics to be scraped by Prometheus.

Pay attention to spec.podMetricsEndpoints:

https://github.com/prometheus-operator/prometheus-operator/blob/5b880a05ab72112ff26ac3c3eb5bffddbafbc468/pkg/apis/monitoring/v1/podmonitor_types.go#L169-L184

the official API recommends to use port field, which represents Name of the Pod port which this endpoint refers to, instead of deprecated filed targetPort which represents Name or number of the target port of the Pod object behind the Service.

So, here comes the problem, we didn't specify the port field in container description of karmada-controller-manager and give it a name, which @wangxf1987 can reference it in spec.podMetricsEndpoints of PodMonitor.

this is the reason for why this PR proposed.

Hi @wangxf1987, am I right?

chaosi-zju avatar May 14 '24 09:05 chaosi-zju

By the way, this PR focuses on exposing ports in the helm chart template, what about operators and cli? Should we keep them consistent?

If we all agree on the necessity of this PR, then we should build an issue to make up for the corresponding modifications of other installation methods.

chaosi-zju avatar May 14 '24 09:05 chaosi-zju

Is this ServiceMonitor and PodMonitor?

By the way, this PR focuses on exposing ports in the helm chart template, what about operators and cli? Should we keep them consistent? cc @chaosi-zju

Expose this port for PodMonitor only

wangxf1987 avatar May 14 '24 13:05 wangxf1987

Let me talk about my understanding of this PR:

First, our karmada-controller-manager does listened 8080 port as metrics querying API, we defined it in

https://github.com/karmada-io/karmada/blob/e6c92d5cba0edfa81e920bfff98db1b4be257d82/cmd/controller-manager/app/controllermanager.go#L161

we can try as:

$ kubectl --context karmada-host exec -it karmada-controller-manager-cdbbf75dd-69fxb -n karmada-system -- sh
/ # curl http://127.0.0.1:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

even without this PR, we can still access /metrics API of karmada-controller-manager from other pod, just like:

$ kubectl --context karmada-host exec -it karmada-descheduler-7d654cbff6-5jmrt -n karmada-system -- sh

# here, `10.244.0.12` is the pod IP of `karmada-controller-manager`.
/ # curl http://10.244.0.12:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

So, the normal operation of the /metrics API does not originally rely on the content proposed by this PR.

However, @wangxf1987 has needs to use Prometheus to collect metrics, as described in this tutorial: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#using-podmonitors

take PodMonitor as an example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  podMetricsEndpoints:
  - port: web

In practice, the spec.selector label tells Prometheus which Pods should be scraped, while spec.podMetricsEndpoints defines an endpoint serving Prometheus metrics to be scraped by Prometheus.

Pay attention to spec.podMetricsEndpoints:

https://github.com/prometheus-operator/prometheus-operator/blob/5b880a05ab72112ff26ac3c3eb5bffddbafbc468/pkg/apis/monitoring/v1/podmonitor_types.go#L169-L184

the official API recommends to use port field, which represents Name of the Pod port which this endpoint refers to, instead of deprecated filed targetPort which represents Name or number of the target port of the Pod object behind the Service.

So, here comes the problem, we didn't specify the port field in container description of karmada-controller-manager and give it a name, which @wangxf1987 can reference it in spec.podMetricsEndpoints of PodMonitor.

this is the reason for why this PR proposed.

Hi @wangxf1987, am I right?

Yes, the por name needs to be specified in PM, so the metrices port name needs to be defined in the contaienr.

wangxf1987 avatar May 14 '24 13:05 wangxf1987

I think we also need a new PR describing how to monitor the search component. Similarly, you need to modify the port name in the search service.

---
apiVersion: v1
kind: Service
metadata:
  name: {{ $name }}-search
  namespace: {{ include "karmada.namespace" . }}
  labels:
    {{- include "karmada.search.labels" . | nindent 4 }}
spec:
  ports:
    - name: https # add this
      port: 443
      protocol: TCP
      targetPort: 443
  selector:
    {{- include "karmada.search.labels" . | nindent 4 }}

wangxf1987 avatar May 14 '24 13:05 wangxf1987

Let me talk about my understanding of this PR: First, our karmada-controller-manager does listened 8080 port as metrics querying API, we defined it in https://github.com/karmada-io/karmada/blob/e6c92d5cba0edfa81e920bfff98db1b4be257d82/cmd/controller-manager/app/controllermanager.go#L161

we can try as:

$ kubectl --context karmada-host exec -it karmada-controller-manager-cdbbf75dd-69fxb -n karmada-system -- sh
/ # curl http://127.0.0.1:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

even without this PR, we can still access /metrics API of karmada-controller-manager from other pod, just like:

$ kubectl --context karmada-host exec -it karmada-descheduler-7d654cbff6-5jmrt -n karmada-system -- sh

# here, `10.244.0.12` is the pod IP of `karmada-controller-manager`.
/ # curl http://10.244.0.12:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

So, the normal operation of the /metrics API does not originally rely on the content proposed by this PR. However, @wangxf1987 has needs to use Prometheus to collect metrics, as described in this tutorial: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#using-podmonitors take PodMonitor as an example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  podMetricsEndpoints:
  - port: web

In practice, the spec.selector label tells Prometheus which Pods should be scraped, while spec.podMetricsEndpoints defines an endpoint serving Prometheus metrics to be scraped by Prometheus. Pay attention to spec.podMetricsEndpoints: https://github.com/prometheus-operator/prometheus-operator/blob/5b880a05ab72112ff26ac3c3eb5bffddbafbc468/pkg/apis/monitoring/v1/podmonitor_types.go#L169-L184 the official API recommends to use port field, which represents Name of the Pod port which this endpoint refers to, instead of deprecated filed targetPort which represents Name or number of the target port of the Pod object behind the Service. So, here comes the problem, we didn't specify the port field in container description of karmada-controller-manager and give it a name, which @wangxf1987 can reference it in spec.podMetricsEndpoints of PodMonitor. this is the reason for why this PR proposed. Hi @wangxf1987, am I right?

Yes, the por name needs to be specified in PM, so the metrices port name needs to be defined in the contaienr.

when the port is defined by default, it it used by prothumes. like this. image image

wangxf1987 avatar May 14 '24 13:05 wangxf1987

I think we also need a new PR describing how to monitor the search component. Similarly, you need to modify the port name in the search service.

---
apiVersion: v1
kind: Service
metadata:
  name: {{ $name }}-search
  namespace: {{ include "karmada.namespace" . }}
  labels:
    {{- include "karmada.search.labels" . | nindent 4 }}
spec:
  ports:
    - name: https # add this
      port: 443
      protocol: TCP
      targetPort: 443
  selector:
    {{- include "karmada.search.labels" . | nindent 4 }}

image

wangxf1987 avatar May 14 '24 13:05 wangxf1987

It looks really fantastic! 👍

Did you do this monitoring through the existing karmada document or create it through your personal way?

By the way, I actually very much hope you can share the complete operation steps of your build monitoring in the form of documents ٩(๑❛ᴗ❛๑)۶. If you also willing to contribute a document to karmada, you can refer to this doc.

chaosi-zju avatar May 14 '24 14:05 chaosi-zju

@chaosi-zju Would you like to run a test as per this document, and share it with us at a community meeting?

RainbowMango avatar May 16 '24 02:05 RainbowMango

Would you like to run a test as per this document, and share it with us at a community meeting?

OK

chaosi-zju avatar May 16 '24 02:05 chaosi-zju

/retitle Expose containing port of serving metrics

RainbowMango avatar May 25 '24 03:05 RainbowMango

Hi @wangxf1987 I still think this a meaningful improvement, and @chaosi-zju shared a demo based on your PR at the community meeting.

The only one comment I left at https://github.com/karmada-io/karmada/pull/4877#discussion_r1614304403. And sure, we hope each component could have a named container port and that can be done by following PR.

@wangxf1987 Would you like to continue with it?

RainbowMango avatar Jul 03 '24 08:07 RainbowMango

Hi @wangxf1987 I still think this a meaningful improvement, and @chaosi-zju shared a demo based on your PR at the community meeting.

The only one comment I left at #4877 (comment). And sure, we hope each component could have a named container port and that can be done by following PR.

@wangxf1987 Would you like to continue with it?

sorry,my mistake caused the branch to be deleted, I will be citunue this PR.

wangxf1987 avatar Jul 04 '24 11:07 wangxf1987

continue to this PR,https://github.com/karmada-io/karmada/pull/5139

wangxf1987 avatar Jul 04 '24 12:07 wangxf1987

Can you share this grafana dashboard configuration? I think this is very useful @wangxf1987

when the port is defined by default, it it used by prothumes. like this. image image

yanfeng1992 avatar Oct 09 '24 10:10 yanfeng1992

Can you share this grafana dashboard configuration? I think this is very useful @wangxf1987

when the port is defined by default, it it used by prothumes. like this. image image

I think this is a good practice. Can be published on a blog. https://karmada.io/zh/blog @RainbowMango

wangxf1987 avatar Oct 10 '24 02:10 wangxf1987

I think this is a good practice. Can be published on a blog. https://karmada.io/zh/blog @RainbowMango

cc @RainbowMango

XiShanYongYe-Chang avatar Oct 10 '24 02:10 XiShanYongYe-Chang

Sure, I agree.

RainbowMango avatar Oct 10 '24 03:10 RainbowMango

Hi, @yanfeng1992 Could you post a new PR to this repo? https://github.com/karmada-io/website/tree/main/blog You can share practice about grafana monitoring.

wangxf1987 avatar Oct 10 '24 08:10 wangxf1987