gitlab-ci-pipelines-exporter icon indicating copy to clipboard operation
gitlab-ci-pipelines-exporter copied to clipboard

Add id and status labels to pipeline and job metrics

Open ErezArbell opened this issue 2 years ago • 9 comments

See details in issue 453. This can be helpful to better filter the queries and also to present more than the last pipeline/job in the dashboard.

ErezArbell avatar May 11 '22 06:05 ErezArbell

Why joining on the gitlab_ci_pipeline_id is not sufficient? You can lookup how it works in the example dashboards.

maciej-gol avatar May 26 '22 21:05 maciej-gol

Why joining on the gitlab_ci_pipeline_id is not sufficient? You can lookup how it works in the example dashboards.

@maciej-gol In the example dashboards you can see only the latest pipeline/job. You cannot see historic data. Example for something that I would like to have: to show all runs of a scpecific job name during the last week and so you can see when it started to fail. Such things cannot be done without having the extra labels this PR adds.

Do you have a way to create a dashboard with historical data using the curernt implementation, I would like to hear it. We need such dashaboard and I did not find any way to get a list of historic pipelines/jobs with option to filter.

ErezArbell avatar May 28 '22 11:05 ErezArbell

I understand your issue, as I'm facing it, too. Having said that I don't believe adding labels will solve it (on its' own). Why? You can already figure out what pipeline relevant metrics refer to by looking up the pipeline_id metric. Adding labels will only duplicate the data exported whilst opening you to the problem of growing metrics.

First of all, you need to tweak the exporter to crawl all the pipelines, not only the most recent ones. I might be mistaken, but your MR only tackles the labels, not the crawling.

Secondly, growing metrics issue. The problem with the prometheus library is that it doesn't forget metrics' labels once observed. This is important, because in infinity, the exporter will present Prometheus with ALL the jobs ever seen, on every scrape. That's the same as just querying your Gitlab DB directly. You could restart the exporter, but things get messy when you use redis for HA.

Having said all of this, I believe this exporter is not suitable to monitor the health of your GCI system when you allow more than one pipeline per ref. In such, I'm currently opting to building the state of ALL of the pipeline by querying webhooks data (although that's not all).

I share your need of tracking ALL running pipeline, but I'm worried this exporter would need architectural changes to work to address this need.

sob., 28 maj 2022, 13:24 użytkownik ErezArbell @.***> napisał:

Why joining on the gitlab_ci_pipeline_id is not sufficient? You can lookup how it works in the example dashboards.

@maciej-gol https://github.com/maciej-gol In the example dashboards you can see only the latest pipeline/job. You cannot see historic data. Example for something that I would like to have: to show all runs of a scpecific job name during the last week and so you can see when it started to fail. Such things cannot be done without having the extra labels this PR adds.

Do you have a way to create a dashboard with historical data using the curernt implementation, I would like to hear it. We need such dashaboard and I did not find any way to get a list of historic pipelines/jobs with option to filter.

— Reply to this email directly, view it on GitHub https://github.com/mvisonneau/gitlab-ci-pipelines-exporter/pull/455#issuecomment-1140241962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOGQTPLBFFUIPZA2ZGWDOLVMH65HANCNFSM5VT26MXA . You are receiving this because you were mentioned.Message ID: @.***>

maciej-gol avatar May 28 '22 11:05 maciej-gol

Thank you @maciej-gol for the insightful comments.

I might be mistaken, but your MR only tackles the labels, not the crawling.

You are correct. However, this MR does create an improvement with collecting the data: the way it works is that it always publish only the latest job (for example) that have that same set of labels-values set. So In the current implementation, if a new pipeline starts ont he same ref before the old one ends then only the job from thenew pipeline will be published. This MR add the pipeline_id and job_id labels and they are unique. So the jobs from the older pipelines will still be published.

in infinity, the exporter will present Prometheus with ALL the jobs ever seen, on every scrape

You have a good point here. Now that I think about it, it is indeed what is expected to happen, but it is not what I see when I look at the '/metrics' endpoint. Anyway, it is a good point.

Having said all of this, I believe this exporter is not suitable to monitor the health of your GCI system when you allow more than one pipeline per ref ... I share your need of tracking ALL running pipeline, but I'm worried this exporter would need architectural changes to work to address this need.

I agree. This is not the suitable tool. This was the closest I found so I thought to use it. I understand that this PR will not be pulled. I will, however, leave this PR open since I would like to get a response from the repo owner, maybe he will have a suggestion.

It is strage that no such tools is avaiable for GitLab, which is a popular commercial product.

BYW, what is "GCI system"?

ErezArbell avatar May 28 '22 12:05 ErezArbell

Since I've been working quite a lot with Gitlab here at Codility, I've started to use GCI in place of Gitlab CI, as it gets tiresome writing the full name over and over :D

sob., 28 maj 2022, 14:24 użytkownik ErezArbell @.***> napisał:

Thank you @maciej-gol https://github.com/maciej-gol for the insightful comments.

I might be mistaken, but your MR only tackles the labels, not the crawling.

You are correct. However, this MR does create an improvement with collecting the data: the way it works is that it always publish only the latest job (for example) that have that same set of labels-values set. So In the current implementation, if a new pipeline starts ont he same ref before the old one ends then only the job from thenew pipeline will be published. This MR add the pipeline_id and job_id labels and they are unique. So the jobs from the older pipelines will still be published.

in infinity, the exporter will present Prometheus with ALL the jobs ever seen, on every scrape

You have a good point here. Now that I think about it, it is indeed what is expected to happen, but it is not what I see when I look at the '/metrics' endpoint. Anyway, it is a good point.

Having said all of this, I believe this exporter is not suitable to monitor the health of your GCI system when you allow more than one pipeline per ref ... I share your need of tracking ALL running pipeline, but I'm worried this exporter would need architectural changes to work to address this need.

I agree. This is not the suitable tool. This was the closest I found so I thought to use it. I understand that this PR will not be pulled. I will, however, leave this PR open since I would like to get a response from the repo owner, maybe he will have a suggestion.

It is strage that no such tools is avaiable for GitLab, which is a popular commercial product.

BYW, what is "GCI system"?

— Reply to this email directly, view it on GitHub https://github.com/mvisonneau/gitlab-ci-pipelines-exporter/pull/455#issuecomment-1140250868, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOGQTJ6HXPU2LJZHORPRKLVMIGAVANCNFSM5VT26MXA . You are receiving this because you were mentioned.Message ID: @.***>

maciej-gol avatar May 28 '22 12:05 maciej-gol

@ErezArbell since your use-case is monitoring general ratio of successes of your jobs (per ref, perhaps), I believe implementing job hooks to simply store success/failures counters would be enough, without opening yourself to the growing metrics problem.

You could export job status counters, and just expose it via gitlab_ci_pipeline_job_status_counter{job_name, ref, project, status}. Fail rate would be increase(_counter{status='failed'}) / (increase(_counter{status='failed'} + success). Tracking should also be easy, if we start with hooks only first.

It might solve my problem (tracking all pending jobs), but I would need to give it more thought.

What do you think?

maciej-gol avatar May 28 '22 20:05 maciej-gol

👋 Hi everyone! this issue is very interesting. We are having the same problem to be able to track the final status of all the jobs and their evolution, since as @ErezArbell comments, it only reports the status of the last job.

I'm going to try running the app with the changes incorporated by @ErezArbell and see if it fixes our problem.

I look forward to the resolution of this issue 🦊

tinchoram avatar May 31 '22 15:05 tinchoram

@tinchoram, I added to the "quickstart" example two dashboards that I created to use those changes.

  • Pipelines History
  • Jobs History Those dashboards allow to present the full history and also let you filter what is shown by various parameters. As @maciej-gol wrote, this is not production ready. But those dashboards will let you use those changes and also see the benefits of them and the problems we have, like this issue

ErezArbell avatar Jun 01 '22 07:06 ErezArbell

@maciej-gol, I do not need the ratios. I need to see the history of pipelines and jobs in a table, with options to filter.

ErezArbell avatar Jun 01 '22 07:06 ErezArbell