monitoring
monitoring copied to clipboard
 
I'd like to see a graph for TiDB that shows per-query memory usage (avg, p90, p99, etc.). I think it would be very helpful to know if there are a...
Add new panel on node_exporter dashboard,to display the buddyinfo and meminfo metrics. meminfo metrics: Display Buffer, Dirty, MemFree, Mlocked,Slab, Writeback. buddyinfo metrics: Display the number of pages in each node/zone...
For now, the dns of k8s-monitor yaml is http://prometheus-k8s.monitoring.svc, but in our image, we use http://k8s-prometheus.monitoring.svc
replace_expr: - rule_name: "pd_cluster_low_space" expr: '(sum(pd_cluster_status{type="store_low_space_count"}) by (instance) > 0) and (sum(etcd_server_is_leader) by (instance) > 0)' - rule_name: "pd_cluster_lost_connect_tikv_nums" expr: '(sum ( pd_cluster_status{type="store_disconnected_count"} ) by (instance) > 0) and (sum(etcd_server_is_leader)...
somewhat like #2. Grafana shows information for each PD even though it only has it for the PD leader. Then if the leader changes the chart for the new leader...
close https://github.com/pingcap/monitoring/issues/18
Alerts that depend on disk latency are too sensitive for public cloud resources, and keeps triggering for each disk latency fluctuation. The following is the list of alerts that should...
PD_miss_peer_region_count kept triggering because it was not checking if the alert is issued by the current pd leader or not. I see that we currently adjust pd_cluster_low_space, pd_cluster_lost_connect_tikv_nums and pd_pending_peer_region_count...