feat(metrics): add Vector Throughput & health (via prometheus)
Add a dashboard to monitor vector throughput usage and log loss. The dashboard should show throughput for the following pipes
Throughput
- stdout -> loki
- stdout -> opensearch
- kafka -> loki
- Kafka -> transform -> opensearch These flows should include incoming events / outgoing events & dropped events as a time series chart
kafka source should contain consumer lag metrics as well
Health (this would be primarily powered by these metrics)
- CPU usage of vector
- Memory usage of vector
- buffer size
- errors happening in transforms
- utilization of each component
Ideally we can take most of the components from a openly available data source by modifying some components to make it geared towards our setup
@lsampras I am interested in working on this task
Hey @Prashant-dot1, Thanks for your interest, this issue is available for contribution.
Since this is somewhat of an open issue without fixed specifications. We prefer to get a bit of details about the implementation
- is there any existing dashboard that you would be using entirely or as a reference?
- do you plan to create your own dashboard for this?
@lsampras I am thinking of taking help of these openly available dashboards (these would need modification according to the task)-
Health metrics or system-level metrics, tracking how well the Vector instance is handling all the event pipes together - https://grafana.com/grafana/dashboards/19649-vector-monitoring/
https://grafana.com/grafana/dashboards/721-kafka/
The dashboard structure could be something like this - Row 1: Four panels (one for each pipeline) that show throughput metrics: incoming, outgoing, and dropped events. Row 2: Kafka metrics, specifically consumer lag for the Kafka-related pipelines. Row 3: General health metrics like CPU usage, memory usage, buffer utilization, and error tracking for the overall system.
@Prashant-dot1 the shared design looks good... I'll assign this