community
community copied to clipboard
Proposal: Custom Metrics for Tasks and Pipelines
Objective
Better support for monitoring TaskRuns and PipelineRuns
How
A new abstraction that allows you to customize how the metrics will be exported. This way to can control the cardinality without tampering with our ability to get highly specific metrics.
Use case
Currently, Tekton Pipelines' metrics can be configured by changing config-observability. However, the options that we have are very limited. You can configure to use metrics.taskrun.level: task
, and then you can't aggregate further than that. And if you use metrics.taskrun.level: taskrun
, that's not recommended since can lead to unbounded cardinality.
Just to be more specific, let me share some examples of use cases:
Environment monitoring
Let's say you have a task that receives environment
as a parameter (prod, staging, qa, etc.). You might want to analyze: is there an environment that is slower to run this task? How much slower? Is qa
environment being used at all?
Task optimization
You might just have merged a test fix, and want to know: How much that fix improved the duration of the integration tests? Did it improve the error rate of that task using similar parameters? etc.
Anomaly Detection
Is the CICD platform executing tasks normally? Are there too many tasks coming from a single repository? Are all tasks failing?