monitoring issues

Bug:Tikv-Detail Memory metric display include tidb instance.

2

![image](https://user-images.githubusercontent.com/13257127/140645182-cec9f602-ea3f-469a-acb1-edbfabd11179.png) ![image](https://user-images.githubusercontent.com/13257127/140645172-0915ae76-8a3f-41ae-b9b7-31e5a44e6bdf.png)

mikechengwei

Add monitoring for per-query memory usage

I'd like to see a graph for TiDB that shows per-query memory usage (avg, p90, p99, etc.). I think it would be very helpful to know if there are a...

kolbe

Grafana display the buddyinfo and meminfo

Add new panel on node_exporter dashboard,to display the buddyinfo and meminfo metrics. meminfo metrics: Display Buffer, Dirty, MemFree, Mlocked,Slab, Writeback. buddyinfo metrics: Display the number of pages in each node/zone...

wangkedba

Add TiFlash-Proxy-Details dashboard

DanielZhangQD

K8s monitor deploy should keep the same dns with image

For now, the dns of k8s-monitor yaml is http://prometheus-k8s.monitoring.svc, but in our image, we use http://k8s-prometheus.monitoring.svc

qiffang

Need to update rules

replace_expr: - rule_name: "pd_cluster_low_space" expr: '(sum(pd_cluster_status{type="store_low_space_count"}) by (instance) > 0) and (sum(etcd_server_is_leader) by (instance) > 0)' - rule_name: "pd_cluster_lost_connect_tikv_nums" expr: '(sum ( pd_cluster_status{type="store_disconnected_count"} ) by (instance) > 0) and (sum(etcd_server_is_leader)...

qiffang

Grafana chart loses history when the leader changes

4

somewhat like #2. Grafana shows information for each PD even though it only has it for the PD leader. Then if the leader changes the chart for the new leader...

gregwebs

Fix PD_miss_peer_region_count

2

close https://github.com/pingcap/monitoring/issues/18

sokada1221

Adjust the duration threshold for the alerts dependent on disk latency

Alerts that depend on disk latency are too sensitive for public cloud resources, and keeps triggering for each disk latency fluctuation. The following is the list of alerts that should...

sokada1221

Add the PD leader check for PD_miss_peer_region_count

PD_miss_peer_region_count kept triggering because it was not checking if the alert is issued by the current pd leader or not. I see that we currently adjust pd_cluster_low_space, pd_cluster_lost_connect_tikv_nums and pd_pending_peer_region_count...

sokada1221

monitoring
monitoring copied to clipboard

Metadata

Bug:Tikv-Detail Memory metric display include tidb instance.

Add monitoring for per-query memory usage

Grafana display the buddyinfo and meminfo

Add TiFlash-Proxy-Details dashboard

K8s monitor deploy should keep the same dns with image

Need to update rules

Grafana chart loses history when the leader changes

Fix PD_miss_peer_region_count

Adjust the duration threshold for the alerts dependent on disk latency

Add the PD leader check for PD_miss_peer_region_count

← Metadata

Owner

Metadata

monitoring monitoring copied to clipboard

Metadata

← Metadata

Owner

Metadata

monitoring
monitoring copied to clipboard