vector icon indicating copy to clipboard operation
vector copied to clipboard

Publish all metrics with an initial value

Open jszwedko opened this issue 4 years ago • 4 comments

Current Vector Version

vector 0.12.0 (g27e8f7f x86_64-unknown-linux-gnu 2021-02-22)

Use-cases

It has occurred to me that we don't publish metrics for events until the event is fired, for example we never publish processing_errors_total for the json_parser transform until an event actually fails to parse. I believe this can lead to some confusion with users not understanding what set of metrics should be published for a given set of components and requires users to convert null values to 0 when making dashboards (in Grafana at least). The latter aspect also makes it impossible to tell when a metric is missing or simply hasn't been published yet.

I think this may be one cause of https://github.com/timberio/vector/issues/6530

Example config:

[sources.in]
  type = "stdin"

[sources.metrics]
  type = "internal_metrics"

[transforms.json]
  type = "json_parser"
  inputs = ["in"]

[sinks.blackhole]
  type = "blackhole"
  inputs = ["json"]

[sinks.console]
  type = "console"
  inputs = ["metrics"]
  encoding.codec = "json"

Note that if you only publish valid JSON messages, you will never see

{"name":"processing_errors_total","namespace":"vector","tags":{"component_kind":"transform","component_name":"json","component_type":"json_parser","error_type":"failed_parse"},"timestamp":"2021-[0/1909]:36:33.514508Z","kind":"absolute","counter":{"value": 0}}

In the output. That metric only appears if an event fails to parse as JSON.

Proposal

Ensure that all metrics are published initially with their 0 value.

References

jszwedko avatar Feb 22 '21 21:02 jszwedko

There are a couple of complications with this:

  • For metrics that have dynamic tags (like http_client_responses_total which has a status as a tag) its unclear what tag to publish the initial metric with
  • I don't think it'll be enough to publish just the initial metric, we may need to periodically republish with a value of 0 to keep the timeseries active in whatever sink the user is using

jszwedko avatar Feb 15 '22 18:02 jszwedko

Good step into right direction would be including 0's for non-component based https://vector.dev/docs/reference/configuration/sources/internal_metrics/

e.g.

  • https://vector.dev/docs/reference/configuration/sources/internal_metrics/#reloaded_total
  • https://vector.dev/docs/reference/configuration/sources/internal_metrics/#config_load_errors_total

etc; since those are quite important to get right with rate promQL in the monitoring system

nmiculinic avatar Aug 03 '23 07:08 nmiculinic

Just hit the same issue. When we could expect this issue to be fixed?

haiwu avatar Jul 05 '24 22:07 haiwu

Just hit the same issue. When we could expect this issue to be fixed?

It's not currently on the roadmap so it is difficult to say (contributions, of course, always welcome). As mentioned above this is also tricky for metrics that have dynamic tags.

jszwedko avatar Jul 08 '24 13:07 jszwedko