results
results copied to clipboard
Epic: Production Metrics for Results
Feature Request
Add/expose Prometheus metrics that are useful for monitoring the Results apiserver and watchers.
This issue is intended to be an "epic" with linked sub-issues for specific metrics that Results should expose.
Notes
Originally posted by @adambkaplan in https://github.com/tektoncd/results/pull/294#discussion_r1071701477
A few suggestions for relevant metrics in my opinion:
API server
- Total number of errors to process requests.
- Time taken to process each request.
- Gorm related metrics (e.g. how long each query - maybe we could group the metric by GRPC operation - takes to complete).
Note: I am not familiar with the GRPC ecosystem, but I think that many of those metrics are already exposed out-of-the-box. So, we could confirm that and consider what else we need to instrument.
Watcher
- Error rate.
- How long the requests made to the GRPC server are taking to be returned.
- How long the reconciliation loop is taking.
- Total number of deleted objects.
- Work queue metrics (e.g. lag).
Knative already exposes a few metrics about controllers. So, we could confirm if they're already in place and what else we need to instrument.
/assign enarha
I just started looking more seriously into this. One easy way to see what we currently export through the gRPC middleware is to create a port-forwarding to the tekton-results-api pod and 9090 port and use curl like curl 127.0.0.1:9090/metrics
. It includes mostly total numbers which is not very helpful. The watcher also exports some metrics out of the box. I'll continue digging.
/area roadmap
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
/remove-lifecycle stale
/lifecycle frozen
This is a critical feature set.