cugetreg icon indicating copy to clipboard operation
cugetreg copied to clipboard

[BACKLOG] Monitoring Systems (metrics collection and visualization)

Open bombnp opened this issue 2 years ago • 0 comments

Problem

There's currently no monitoring dashboard for our system in the following categories:

  1. Resource usage (CPU, Memory, Disk, I/O) -> can utilize open-source metric exporters
  2. Performance (Latency, Error rates) -> requires custom logging?

We want to identify areas to optimize resources since our resources are starting to run out.

Task Description

Create a monitoring system consisting of metrics collection (through exporters and Prometheus) and visualization (through Grafana?). Visualize metrics by resources, nodes, pods, or other api objects as needed.

Additional Context

For now, I've enabled Prometheus + node-exporter + kube-state-metrics stacks integration from Lens(in lens-metrics namespace), which can be used to visualize usage of specific nodes/pods, but multiple at the same time. It's likely you'd be using the same stacks, but some metrics must be installed ourselves.

Related Teams

  • [ ] Frontend
  • [ ] Backend
  • [ ] Data
  • [ ] Design
  • [X] Infra
  • [ ] QA

Task Advisors

@bombnp

bombnp avatar Jan 10 '23 02:01 bombnp