trivy icon indicating copy to clipboard operation
trivy copied to clipboard

Prometheus Metrics Endpoint

Open computeralex92 opened this issue 4 years ago • 21 comments

In a server / client setup it would be great if Trivy would expose some metrics about the scans happen with the central server. Some useful metrics for my implementation:

  • Last DB Update (timestamp)
  • Last DB Update Attempt (timestamp)
  • Sum of Issues found
  • Sum of Issues found splited up in SEVERITY
  • Sum of Issues found splited up in sources (OS, Python, Node etc)

As Trivy is build to scan Docker Images, I would suggest to provide such metrics via a Prometheus metrics endpoint because Prometheus is quite popular in the Docker / Kubernetes community.

computeralex92 avatar Dec 29 '19 12:12 computeralex92

Nice suggestion. I think this improvement can be done step by step. It is not difficult to add Prometheus metrics endpoint. Welcome PR!

knqyf263 avatar Dec 30 '19 12:12 knqyf263

Hi @knqyf263 , I am new to open source dev although I have had experience working with git extensively and some experience with golang too. So, can I hop on to developing a PR for this issue considering the "good first issue" label?

yashvardhan-kukreja avatar May 08 '20 02:05 yashvardhan-kukreja

Hi @yashvardhan-kukreja, thank you for your interest! Yes, it would be helpful. As a first step, we can just return the database information such as Last DB Update as he mentioned.

Here is the server mux. https://github.com/aquasecurity/trivy/blob/master/pkg/rpc/server/listen.go#L61-L79

You can get the database metadata like the following. https://github.com/aquasecurity/trivy/blob/master/internal/operation/operation.go#L84-L93

knqyf263 avatar May 10 '20 07:05 knqyf263

Hi @knqyf263, sorry I was caught up with some crucial work since one month. Now, I am back on this.

yashvardhan-kukreja avatar Jun 22 '20 13:06 yashvardhan-kukreja

@knqyf263 , I made some mistakes when I made the pull request number #540 , So, I closed it and re-opened the a new PR (#542 ) for this issues and closed the previous one. If you find it suitable, then, please delete the #540 Sorry for the inconvenience

yashvardhan-kukreja avatar Jun 23 '20 11:06 yashvardhan-kukreja

Hi @yashvardhan-kukreja, this is OSS project, so you don't have to apologize that you don't have time to work on this issue. I'm so grateful for your contribution! AFAIK, we couldn't delete a PR on GitHub. It is enough to close the PR.

knqyf263 avatar Jun 23 '20 11:06 knqyf263

@knqyf263 , @computeralex92 , I have a few basic doubts with this issue. Please clarify them:

  1. So, first of all, in the first line, what exactly does the "central server" mean? Like does it mean the server/host/computer where the trivy server --listen command got executed?
  2. So, here are we looking to setup a GET /metrics endpoint which would return (respond with) metrics like "Last DB Update" for prometheus?
  3. Finally, to implement these custom metrics, the way I look at it, it seems that I would need to utilise the "promauto" and "prometheus" packages. Am I right?

yashvardhan-kukreja avatar Jun 23 '20 12:06 yashvardhan-kukreja

@yashvardhan-kukreja First of course thank you for implementing this idea. Unfortunately I had no time in the last months to do it on my own.

Regarding your questions:

1. So, first of all, in the first line, what exactly does the "central server" mean? Like does it mean the server/host/computer where the `trivy server --listen` command got executed?

Correct. Use case: As part of a CI/CD pipeline, I want to monitor the performed scans and the trivy setup e.g. via Grafana. Since the client (within the pipeline) should not download the DB etc, the scan is happen in a trivy server running with trivy server.

2. So, here are we looking to setup a `GET /metrics` endpoint which would return (respond with) metrics like "Last DB Update" for prometheus?

Correct. The idea behind is to monitor the status of the DB and e.g. alerted if the DB gets to old or is not able to update anymore.

3. Finally, to implement these custom metrics, the way I look at it, it seems that I would need to utilise the "promauto" and "prometheus" packages. Am I right?

No glue, sorry.

computeralex92 avatar Jun 23 '20 19:06 computeralex92

@computeralex92 thanks for the quick and well descriptive reply. It cleared out all the things. No worries regarding 3rd question, I mainly wanted to confirm the first two questions. I'll start working on implementing this, @knqyf263 :smile:

yashvardhan-kukreja avatar Jun 23 '20 20:06 yashvardhan-kukreja

@computeralex92 @knqyf263 , on ideating upon how to export metrics for Last DB Update, I came up with this idea

On GET /metrics, this would be the output: DBUpdate{time="2020-06-26 14:54:38.198245437 +0000 UTC"} 1 DBUpdate{time="2020-06-26 14:54:38.698289119 +0000 UTC"} 1 DBUpdate{time="2020-06-26 14:54:39.198286756 +0000 UTC"} 1

So, here, I was using DBUpdate metric as a counter with "time" as the label. So, basically, for every timestamp, the counter for it will be created.

So, basically, if I implement this, then, in trivy, whenever a DB Update occurs, for example at 2020-06-26 14:54:38, then an entry DBUpdate{time="2020-06-26 14:54:38"} 1 will be added to the existing metrics of DB Update.

So, with that I believe we would be easily able to fetch the Last DB Update and we can even further plot all the times when DB Update happened and we find something like the first DB Update because we will be storing all the DB Updates for that session in the metrics.

So should I go on and implement this and if not then would you like to suggest any other way of storing DB Update metrics and displaying them at /metrics endpoint?

yashvardhan-kukreja avatar Jun 26 '20 15:06 yashvardhan-kukreja

Hi,

nice work,so far. If i might.. a suggestion from the prometheus standpoint: We had sth. very similar implemented at work. The Problem with putting metrics inside the labels is, that it might (or most definitleywill) blow up your TSDB. If possible, it might be better to put a timestamp for the metrics like:

trivy{action="dbupdate"} 1593184501

You could still see from the metrics when the updates did happen?

PS: you might alsow want to check the prometheus guide about naming convention, but that's probably more cosmetics ;) https://prometheus.io/docs/practices/naming/

strowi avatar Jun 26 '20 15:06 strowi

Thanks for the suggestion, @strowi. So, just to confirm, everytime a DB Update will happen, trivy will just overwrite trivy{action="dbupdate} so whenever we will go to GET /metrics, we can simply look at trivy{action="dbupdate}, to see the latest db update (because that would correspond to the overwritten timestamp of latest db update).

I hope I am right?

yashvardhan-kukreja avatar Jun 26 '20 18:06 yashvardhan-kukreja

@yashvardhan-kukreja yes, you will always get the latest unix-timestamp in a single metric which gets overwritten. Otherwise if the labels change prometheus sees this as a somewhat different metric.

For Example: This comes especially into play if you want to get metrics for images + count of vulnerabilities:

Using tagged build, you will get a metric for a specific image:

trivy_container_issues{image="dr.cooking.net/something/nginx:build-master-777",instance="production",job="trivy_scan",monitor="production",namespace="sth"} 123

But if you update the image (maybe fixing the vulnerabilities), you create another metric:

trivy_container_issues{image="dr.cooking.net/something/nginx:build-master-777",instance="production",job="trivy_scan",monitor="production",namespace="sth"} 123
trivy_container_issues{image="dr.cooking.net/something/nginx:build-master-778",instance="production",job="trivy_scan",monitor="production",namespace="sth"} 10

If you have an alert on this, you will still get the alerts for the previous image..

Same principle for DB-updates.

strowi avatar Jun 26 '20 18:06 strowi

This seems like a fabulous approach to me @strowi , thanks a lot for this. @computeralex92 , @knqyf263 this seems perfect to me, to be honest. What do you think, should I start moving on to implementing this?

yashvardhan-kukreja avatar Jun 26 '20 19:06 yashvardhan-kukreja

@yashvardhan-kukreja It looks fine to me!

knqyf263 avatar Jun 28 '20 15:06 knqyf263

Hi,

i`m very interested in this feature. What exactly is the current state of the issue and how this will go on?

Cheers,

Daniel

dzabel avatar Aug 13 '21 10:08 dzabel

hi guys, what's the status of this?

bygui86 avatar Nov 03 '21 15:11 bygui86

Ping! :)

DracoBlue avatar Dec 12 '21 10:12 DracoBlue

Ping! :)

andrisro avatar Feb 22 '22 15:02 andrisro

hi guys, still no updates on this? :( it would be a really helpful feature!

bygui86 avatar Jul 19 '22 06:07 bygui86

We are interested into this to. Maybe one of our endava go developer can create a PR for it.

DracoBlue avatar Jul 28 '22 14:07 DracoBlue

Ping! :)

nthienan avatar Sep 30 '22 16:09 nthienan

Ping !

jc16180 avatar Nov 03 '22 15:11 jc16180

It is probably not the answer you want, but at the moment we don't have enough maintainers, so we are concentrating our resources on Trivy Operator rather than extending the Trivy server. The operator supports Prometheus. You can use it. We hope for your kind understanding.

knqyf263 avatar May 15 '23 15:05 knqyf263

For anyone stumbling on this.. i threw together a small bash script that can check all images running in a cluster. and pushed the metrics to a pushgateway. Can be adapted for CI, should be pretty straighforward: https://gitlab.com/strowi/trivy-check Maybe it helps someone.

strowi avatar May 15 '23 16:05 strowi