clipper
clipper copied to clipboard
[Metrics] Extend metrics to measure physical performance stats
It would be great to extend our monitoring infrastructure to measure more physical performance. To start with, @blackhat06 suggested tracking the following resource metrics:
- [ ] Disk IO: % time that device was busy
- [ ] Memory: % of total memory capacity in use
- [ ] CPU Utilization: 5,10,15 min
- [ ] Memory Utilization: Breakdown by memory
- [ ] % volume usage: Disk all mounted
- [ ] Bits IN/Out (ethernet)
- [ ] Volume I/O
- [ ] Process count / running/blocked
Prometheus can track these with node exporter https://github.com/prometheus/node_exporter/blob/master/README.md
For Kubernetes we can just scrape kube-api-server/metrics. Kubernetes expose Prometheus metrics there
Update:
- For Docker, we can safely assume user only has on node so we can just run a node exporter at startup.
- Kubernetes's API service does expose metrics but the metrics are about the api server requests itself and etcd usage. We should use a DaemonSet. It will deploy a prometheus node exporter to each node (https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/). In fact this is recommended practice by Kubernetes.
Awesome. cc @blackhat06
@simon-mo Is this handled?