trivy
trivy copied to clipboard
Prometheus Metrics Endpoint
In a server / client setup it would be great if Trivy would expose some metrics about the scans happen with the central server. Some useful metrics for my implementation:
- Last DB Update (timestamp)
- Last DB Update Attempt (timestamp)
- Sum of Issues found
- Sum of Issues found splited up in SEVERITY
- Sum of Issues found splited up in sources (OS, Python, Node etc)
As Trivy is build to scan Docker Images, I would suggest to provide such metrics via a Prometheus metrics endpoint because Prometheus is quite popular in the Docker / Kubernetes community.
Nice suggestion. I think this improvement can be done step by step. It is not difficult to add Prometheus metrics endpoint. Welcome PR!
Hi @knqyf263 , I am new to open source dev although I have had experience working with git extensively and some experience with golang too. So, can I hop on to developing a PR for this issue considering the "good first issue" label?
Hi @yashvardhan-kukreja, thank you for your interest! Yes, it would be helpful. As a first step, we can just return the database information such as Last DB Update
as he mentioned.
Here is the server mux. https://github.com/aquasecurity/trivy/blob/master/pkg/rpc/server/listen.go#L61-L79
You can get the database metadata like the following. https://github.com/aquasecurity/trivy/blob/master/internal/operation/operation.go#L84-L93
Hi @knqyf263, sorry I was caught up with some crucial work since one month. Now, I am back on this.
@knqyf263 , I made some mistakes when I made the pull request number #540 , So, I closed it and re-opened the a new PR (#542 ) for this issues and closed the previous one. If you find it suitable, then, please delete the #540 Sorry for the inconvenience
Hi @yashvardhan-kukreja, this is OSS project, so you don't have to apologize that you don't have time to work on this issue. I'm so grateful for your contribution! AFAIK, we couldn't delete a PR on GitHub. It is enough to close the PR.
@knqyf263 , @computeralex92 , I have a few basic doubts with this issue. Please clarify them:
- So, first of all, in the first line, what exactly does the "central server" mean? Like does it mean the server/host/computer where the
trivy server --listen
command got executed? - So, here are we looking to setup a
GET /metrics
endpoint which would return (respond with) metrics like "Last DB Update" for prometheus? - Finally, to implement these custom metrics, the way I look at it, it seems that I would need to utilise the "promauto" and "prometheus" packages. Am I right?
@yashvardhan-kukreja First of course thank you for implementing this idea. Unfortunately I had no time in the last months to do it on my own.
Regarding your questions:
1. So, first of all, in the first line, what exactly does the "central server" mean? Like does it mean the server/host/computer where the `trivy server --listen` command got executed?
Correct.
Use case:
As part of a CI/CD pipeline, I want to monitor the performed scans and the trivy setup e.g. via Grafana.
Since the client (within the pipeline) should not download the DB etc, the scan is happen in a trivy server running with trivy server
.
2. So, here are we looking to setup a `GET /metrics` endpoint which would return (respond with) metrics like "Last DB Update" for prometheus?
Correct. The idea behind is to monitor the status of the DB and e.g. alerted if the DB gets to old or is not able to update anymore.
3. Finally, to implement these custom metrics, the way I look at it, it seems that I would need to utilise the "promauto" and "prometheus" packages. Am I right?
No glue, sorry.
@computeralex92 thanks for the quick and well descriptive reply. It cleared out all the things. No worries regarding 3rd question, I mainly wanted to confirm the first two questions. I'll start working on implementing this, @knqyf263 :smile:
@computeralex92 @knqyf263 , on ideating upon how to export metrics for Last DB Update
, I came up with this idea
On GET /metrics
, this would be the output:
DBUpdate{time="2020-06-26 14:54:38.198245437 +0000 UTC"} 1
DBUpdate{time="2020-06-26 14:54:38.698289119 +0000 UTC"} 1
DBUpdate{time="2020-06-26 14:54:39.198286756 +0000 UTC"} 1
So, here, I was using DBUpdate metric as a counter with "time" as the label. So, basically, for every timestamp, the counter for it will be created.
So, basically, if I implement this, then, in trivy, whenever a DB Update occurs, for example at 2020-06-26 14:54:38, then an entry DBUpdate{time="2020-06-26 14:54:38"} 1
will be added to the existing metrics of DB Update.
So, with that I believe we would be easily able to fetch the Last DB Update and we can even further plot all the times when DB Update happened and we find something like the first DB Update
because we will be storing all the DB Updates for that session in the metrics.
So should I go on and implement this and if not then would you like to suggest any other way of storing DB Update metrics and displaying them at /metrics endpoint?
Hi,
nice work,so far. If i might.. a suggestion from the prometheus standpoint: We had sth. very similar implemented at work. The Problem with putting metrics inside the labels is, that it might (or most definitleywill) blow up your TSDB. If possible, it might be better to put a timestamp for the metrics like:
trivy{action="dbupdate"} 1593184501
You could still see from the metrics when the updates did happen?
PS: you might alsow want to check the prometheus guide about naming convention, but that's probably more cosmetics ;) https://prometheus.io/docs/practices/naming/
Thanks for the suggestion, @strowi. So, just to confirm, everytime a DB Update will happen, trivy will just overwrite trivy{action="dbupdate}
so whenever we will go to GET /metrics
, we can simply look at trivy{action="dbupdate}
, to see the latest db update (because that would correspond to the overwritten timestamp of latest db update).
I hope I am right?
@yashvardhan-kukreja yes, you will always get the latest unix-timestamp in a single metric which gets overwritten. Otherwise if the labels change prometheus sees this as a somewhat different metric.
For Example: This comes especially into play if you want to get metrics for images + count of vulnerabilities:
Using tagged build, you will get a metric for a specific image:
trivy_container_issues{image="dr.cooking.net/something/nginx:build-master-777",instance="production",job="trivy_scan",monitor="production",namespace="sth"} 123
But if you update the image (maybe fixing the vulnerabilities), you create another metric:
trivy_container_issues{image="dr.cooking.net/something/nginx:build-master-777",instance="production",job="trivy_scan",monitor="production",namespace="sth"} 123
trivy_container_issues{image="dr.cooking.net/something/nginx:build-master-778",instance="production",job="trivy_scan",monitor="production",namespace="sth"} 10
If you have an alert on this, you will still get the alerts for the previous image..
Same principle for DB-updates.
This seems like a fabulous approach to me @strowi , thanks a lot for this. @computeralex92 , @knqyf263 this seems perfect to me, to be honest. What do you think, should I start moving on to implementing this?
@yashvardhan-kukreja It looks fine to me!
Hi,
i`m very interested in this feature. What exactly is the current state of the issue and how this will go on?
Cheers,
Daniel
hi guys, what's the status of this?
Ping! :)
Ping! :)
hi guys, still no updates on this? :( it would be a really helpful feature!
We are interested into this to. Maybe one of our endava go developer can create a PR for it.
Ping! :)
Ping !
It is probably not the answer you want, but at the moment we don't have enough maintainers, so we are concentrating our resources on Trivy Operator rather than extending the Trivy server. The operator supports Prometheus. You can use it. We hope for your kind understanding.
For anyone stumbling on this.. i threw together a small bash script that can check all images running in a cluster. and pushed the metrics to a pushgateway. Can be adapted for CI, should be pretty straighforward: https://gitlab.com/strowi/trivy-check Maybe it helps someone.