[Inputs.prometheus] Add collection information
Use Case
Be able to have information about collect data off this plugin. Add somthing like https://github.com/influxdata/telegraf/tree/master/plugins/inputs/http_response
response_time (float, seconds) content_length (int, response body length) response_status_code_match (int, 0 = mismatch, 1 = match) http_response_code (int, response status code) result_code (int, see below)
Expected behavior
Add new metrics
Actual behavior
No data to know if metrics are collected
Additional info
No response
Add new metrics
What problem do these additional metrics solve that using http_response does not already?
Indeed we can use the http_response plugin in parallel with the prometheus plugin to have the state of the requests, but that poses several concerns:
- a double solicitation of the metric page
- the page can respond randomly and the two plugins could see different responses which does not guarantee that the metrics are seen by the prometheus plugin
- Each time you use the prometheus plugin, remember to activate the http_response plugin on the same url
a double solicitation of the metric page
This is a simple GET request by default to a text file.
the page can respond randomly
This is true even if you pull the metrics into the prometheus plugin
and the two plugins could see different responses which does not guarantee that the metrics are seen by the prometheus plugin
Do you have data backing this up? yes it could happen, but highly unlikely.
Each time you use the prometheus plugin, remember to activate the http_response plugin on the same url
Telegraf is a config driven tool, the cost of using it is creating a config for it.
a double solicitation of the metric page
This is a simple GET request by default to a text file.
It depends of the generator of the metric page. Maybe it's a php page that query a database or somethink like dans have a double query increase the charge to the database or the underlying system.
the page can respond randomly
This is true even if you pull the metrics into the prometheus plugin
No, if you have the same source information, it's greater. You can easily analyse the root cause of the problem.
and the two plugins could see different responses which does not guarantee that the metrics are seen by the prometheus plugin
Do you have data backing this up? yes it could happen, but highly unlikely.
We have seen this cas when the metric was scraped during a batch that increase the response time and the timeout is not the same for the both plugin.
Each time you use the prometheus plugin, remember to activate the http_response plugin on the same url
Telegraf is a config driven tool, the cost of using it is creating a config for it.
I know that telegraf have to be configured, but that's no reason not to simplify users' lives.
I think this feature could simplify users lives in my opinion
@spaz88 and @tguenneguez the issue here is that merging random plugins to simplify the users life will not scale. If we start this another user will request to merge the http_response plugin into the http plugin and into another plugin querying an http endpoint, and so on. This will lead to a massive code duplication so ever bugfix to the http_response plugin needs to be spread to all those locations. This is a maintenance nightmare.
Regarding your points @tguenneguez, let me respond one-by-one:
a double solicitation of the metric page
I don't get why this should be a big issue. If generating the metric is that expensive the provider should maybe invest into cashing or other optimizations.
the page can respond randomly and the two plugins could see different responses which does not guarantee that the metrics are seen by the prometheus plugin
Well that might be the same with the two merged. If the results are that random, you will never have any guarantee about the results as your single query could be the outlier just the same as for multiple queries.
Each time you use the prometheus plugin, remember to activate the http_response plugin on the same url
Seriously? You can create the two in a separate file and just copy this file (changing the server address) or even use a generator to create those configs in a consistent and reliable way.
So far I second @powersj's assesment and don't see any good reason to put that burden on the maintenance of Telegraf. If you really, really, really need this feature, you can still create an external plugin that has the two merged.
I don't get why this should be a big issue. If generating the metric is that expensive the provider should maybe invest into cashing or other optimizations.
Caching will skew the metric we are trying to collect.
Good morning,
I understand your point of view of code maintenance. On the other hand, from a user's point of view, it's a shame that when he activates the inputs.prometheus, he doesn't natively have something that tells him if the collection is working (a simple binary thing). If you don't want to include all the indicators, a simple boolean type metric which is 1 if it works and 0 if it doesn't (page inaccessible, incomprehensible format, timeout, etc.). Currently, when you activate the plugin, you have no way of knowing if it works completely from start to finish.
There is inputs.internal plugin which reports error per plugin. I don't know how granular it is.
I agree it would be nice to have a counter in inputs.internal for response_time and content_length for each scraped prometheus URL. That would be easy to implement somewhere around this line.
@redbaron and @tguenneguez would the suggestion of @Hipska work for you?
Good morning
It's actually the same idea as mine. Make a small update to the plugin. I would like to make a pull request in this sense, but the idea of evolution must first be validated. Based on the first post on the issue, it seems that this is not the case.
THANKS Thomas
@tguenneguez after discussing this within the team, we decided to accept a PR adding the proposed metrics to inputs.prometheus. Looking forward to your PR!
Thanks, it's good decision for me.
Good morning,
Indeed, but that means that the PR will also concern the tests of the plugin/outputs/prometheus_client. On the other hand, in the plugin/outputs/prometheus_client tests, they do an equality test: https://github.com/influxdata/telegraf/blob/master/plugins/outputs/prometheus_client/prometheus_client_v1_test.go#L350 I propose to use regexp to filter in actual the ".prometheus_internal.". Is this a good method? Do you have an alternative idea?
Good afternoon