telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

[Inputs.prometheus] Add collection information

Open tguenneguez opened this issue 2 years ago • 15 comments

Use Case

Be able to have information about collect data off this plugin. Add somthing like https://github.com/influxdata/telegraf/tree/master/plugins/inputs/http_response

response_time (float, seconds) content_length (int, response body length) response_status_code_match (int, 0 = mismatch, 1 = match) http_response_code (int, response status code) result_code (int, see below)

Expected behavior

Add new metrics

Actual behavior

No data to know if metrics are collected

Additional info

No response

tguenneguez avatar Apr 17 '23 18:04 tguenneguez

Add new metrics

What problem do these additional metrics solve that using http_response does not already?

powersj avatar Apr 20 '23 18:04 powersj

Indeed we can use the http_response plugin in parallel with the prometheus plugin to have the state of the requests, but that poses several concerns:

  1. a double solicitation of the metric page
  2. the page can respond randomly and the two plugins could see different responses which does not guarantee that the metrics are seen by the prometheus plugin
  3. Each time you use the prometheus plugin, remember to activate the http_response plugin on the same url

tguenneguez avatar Apr 21 '23 10:04 tguenneguez

a double solicitation of the metric page

This is a simple GET request by default to a text file.

the page can respond randomly

This is true even if you pull the metrics into the prometheus plugin

and the two plugins could see different responses which does not guarantee that the metrics are seen by the prometheus plugin

Do you have data backing this up? yes it could happen, but highly unlikely.

Each time you use the prometheus plugin, remember to activate the http_response plugin on the same url

Telegraf is a config driven tool, the cost of using it is creating a config for it.

powersj avatar Apr 21 '23 14:04 powersj

a double solicitation of the metric page

This is a simple GET request by default to a text file.

It depends of the generator of the metric page. Maybe it's a php page that query a database or somethink like dans have a double query increase the charge to the database or the underlying system.

the page can respond randomly

This is true even if you pull the metrics into the prometheus plugin

No, if you have the same source information, it's greater. You can easily analyse the root cause of the problem.

and the two plugins could see different responses which does not guarantee that the metrics are seen by the prometheus plugin

Do you have data backing this up? yes it could happen, but highly unlikely.

We have seen this cas when the metric was scraped during a batch that increase the response time and the timeout is not the same for the both plugin.

Each time you use the prometheus plugin, remember to activate the http_response plugin on the same url

Telegraf is a config driven tool, the cost of using it is creating a config for it.

I know that telegraf have to be configured, but that's no reason not to simplify users' lives.

tguenneguez avatar Apr 25 '23 07:04 tguenneguez

I think this feature could simplify users lives in my opinion

spaz88 avatar Apr 25 '23 08:04 spaz88

@spaz88 and @tguenneguez the issue here is that merging random plugins to simplify the users life will not scale. If we start this another user will request to merge the http_response plugin into the http plugin and into another plugin querying an http endpoint, and so on. This will lead to a massive code duplication so ever bugfix to the http_response plugin needs to be spread to all those locations. This is a maintenance nightmare.

Regarding your points @tguenneguez, let me respond one-by-one:

a double solicitation of the metric page

I don't get why this should be a big issue. If generating the metric is that expensive the provider should maybe invest into cashing or other optimizations.

the page can respond randomly and the two plugins could see different responses which does not guarantee that the metrics are seen by the prometheus plugin

Well that might be the same with the two merged. If the results are that random, you will never have any guarantee about the results as your single query could be the outlier just the same as for multiple queries.

Each time you use the prometheus plugin, remember to activate the http_response plugin on the same url

Seriously? You can create the two in a separate file and just copy this file (changing the server address) or even use a generator to create those configs in a consistent and reliable way.

So far I second @powersj's assesment and don't see any good reason to put that burden on the maintenance of Telegraf. If you really, really, really need this feature, you can still create an external plugin that has the two merged.

srebhan avatar Apr 26 '23 20:04 srebhan

I don't get why this should be a big issue. If generating the metric is that expensive the provider should maybe invest into cashing or other optimizations.

Caching will skew the metric we are trying to collect.

redbaron avatar Jun 05 '23 11:06 redbaron

Good morning,

I understand your point of view of code maintenance. On the other hand, from a user's point of view, it's a shame that when he activates the inputs.prometheus, he doesn't natively have something that tells him if the collection is working (a simple binary thing). If you don't want to include all the indicators, a simple boolean type metric which is 1 if it works and 0 if it doesn't (page inaccessible, incomprehensible format, timeout, etc.). Currently, when you activate the plugin, you have no way of knowing if it works completely from start to finish.

tguenneguez avatar Jun 05 '23 12:06 tguenneguez

There is inputs.internal plugin which reports error per plugin. I don't know how granular it is.

redbaron avatar Jun 05 '23 12:06 redbaron

I agree it would be nice to have a counter in inputs.internal for response_time and content_length for each scraped prometheus URL. That would be easy to implement somewhere around this line.

Hipska avatar Aug 25 '23 10:08 Hipska

@redbaron and @tguenneguez would the suggestion of @Hipska work for you?

srebhan avatar Sep 01 '23 08:09 srebhan

Good morning

It's actually the same idea as mine. Make a small update to the plugin. I would like to make a pull request in this sense, but the idea of evolution must first be validated. Based on the first post on the issue, it seems that this is not the case.

THANKS Thomas

tguenneguez avatar Sep 01 '23 09:09 tguenneguez

@tguenneguez after discussing this within the team, we decided to accept a PR adding the proposed metrics to inputs.prometheus. Looking forward to your PR!

srebhan avatar Oct 31 '23 12:10 srebhan

Thanks, it's good decision for me.

tguenneguez avatar Nov 02 '23 08:11 tguenneguez

Good morning,

Indeed, but that means that the PR will also concern the tests of the plugin/outputs/prometheus_client. On the other hand, in the plugin/outputs/prometheus_client tests, they do an equality test: https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/prometheus_client_v1_test.go#L350 I propose to use regexp to filter in actual the ".prometheus_internal.". Is this a good method? Do you have an alternative idea?

Good afternoon

tguenneguez avatar Jan 16 '24 15:01 tguenneguez