community icon indicating copy to clipboard operation
community copied to clipboard

Proposal: Custom Metrics for Tasks and Pipelines

Open RafaeLeal opened this issue 1 year ago • 0 comments

Objective

Better support for monitoring TaskRuns and PipelineRuns

How

A new abstraction that allows you to customize how the metrics will be exported. This way to can control the cardinality without tampering with our ability to get highly specific metrics.

Use case

Currently, Tekton Pipelines' metrics can be configured by changing config-observability. However, the options that we have are very limited. You can configure to use metrics.taskrun.level: task, and then you can't aggregate further than that. And if you use metrics.taskrun.level: taskrun, that's not recommended since can lead to unbounded cardinality.

Just to be more specific, let me share some examples of use cases:

Environment monitoring

Let's say you have a task that receives environment as a parameter (prod, staging, qa, etc.). You might want to analyze: is there an environment that is slower to run this task? How much slower? Is qa environment being used at all?

Task optimization

You might just have merged a test fix, and want to know: How much that fix improved the duration of the integration tests? Did it improve the error rate of that task using similar parameters? etc.

Anomaly Detection

Is the CICD platform executing tasks normally? Are there too many tasks coming from a single repository? Are all tasks failing?

RafaeLeal avatar Sep 28 '23 20:09 RafaeLeal