tidb-operator icon indicating copy to clipboard operation
tidb-operator copied to clipboard

Add ServiceMonitor and Dashboards For External Prometheus And Grafana

Open stephenstubbs opened this issue 5 years ago • 12 comments

Feature Request

Is your feature request related to a problem? Please describe: I have a prometheus operator installation with thanos and I would rather reuse this than run a separate prometheus and grafana deployment for each cluster.

Describe the feature you'd like: Service monitor and json dashboards for doing this.

Describe alternatives you've considered: Using the current tidb monitor is a possibility but in my opinion it is wastes resources in this scenario.

stephenstubbs avatar Aug 28 '20 18:08 stephenstubbs

@sstubbs Do you mean that you would like a central Prometheus and Grafana to collect the metrics for all of the clusters and do not want to install any TidbMonitor object?

DanielZhangQD avatar Aug 29 '20 02:08 DanielZhangQD

We have supported one prometheus monitor multiple tidbclusters in #3155, and they have been merged into the master branch. You can use thanos example to integrate into your prometheus operator. https://github.com/pingcap/tidb-operator/tree/master/examples/monitor-with-thanos

mikechengwei avatar Aug 30 '20 16:08 mikechengwei

Great thank you!

stephenstubbs avatar Aug 30 '20 18:08 stephenstubbs

@mikechengwei Thanks for the response! #3155 is mainly for the heterogeneous cluster which is actually the same cluster with the original cluster, if we want to collect metrics for different normal TidbClusters (especially with TLS enabled) in different namespaces with one TiDBMonitor, this would require a different solution.

@sstubbs Could you please describe your requirement in more detail?

DanielZhangQD avatar Aug 31 '20 01:08 DanielZhangQD

I see. I just find it easier to manage one prometheus than a few and it's nice to be able to persist data to thanos when needed and monitor everything in the cluster from a single grafana dashboard. If this issue hasn't come up before perhaps there aren't many people with this use case.

stephenstubbs avatar Aug 31 '20 02:08 stephenstubbs

ok, @mikechengwei may work on a solution that one Prometheus for one TidbCluster and aggregate all of the metrics data with Thanos and one single Grafana for Dashboard. There is a draft PR already https://github.com/pingcap/docs-tidb-operator/pull/640 but Mike may update it later.

DanielZhangQD avatar Aug 31 '20 05:08 DanielZhangQD

I see thanks a lot. Not a completely needed feature from my side but it would definitely be nice to have. A few of the other charts I'm using such as itio are also going this way of providing service monitors and dashboards rather than bundling them. They have deprecated the bundled prometheus as of 1.7.

I understand for a lot of users though that quickly being able to deploy bundled tidb monitors are preferred.

stephenstubbs avatar Aug 31 '20 10:08 stephenstubbs

OK. We evaluate this.

DanielZhangQD avatar Aug 31 '20 11:08 DanielZhangQD

Now, we provide prometheus operator example https://github.com/pingcap/tidb-operator/tree/master/examples/prometheus-operator and thanos example https://github.com/pingcap/tidb-operator/tree/master/examples/monitor-with-thanos. But it is not elegant.

I think we can do

  1. optimize the configuration of thanos in tidbmonitor,add thanos spec like in prometheus operator.
  2. Based on thanos requirement, provide an elegant promethes operator example.

mikechengwei avatar Aug 31 '20 12:08 mikechengwei

@mikechengwei Could you please help improve this example to support TiFlash, TiCDC, and Pump? Thanks!

DanielZhangQD avatar Mar 10 '22 09:03 DanielZhangQD

@mikechengwei Could you please help improve this example to support TiFlash, TiCDC, and Pump? Thanks!

ok, one component one tidbmonitor, then thanos aggreate all tidbmonitor, right?

mikechengwei avatar Mar 11 '22 02:03 mikechengwei

I do not recommend user-defined servicemonitor.

mikechengwei avatar Mar 12 '22 13:03 mikechengwei