telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

telegraf wrongly assumes metric is prometheus histogram and breaks itself

Open DaveWK opened this issue 1 year ago • 1 comments

Relevant telegraf.conf

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  hostname = ""
  omit_hostname = false
[[inputs.influxdb_listener]]
  service_address = "127.0.0.1:8086"

[[outputs.opentelemetry]]

Logs from Telegraf

Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64
Nov 30 15:07:38 telegraf[13175]: 2023-11-30T20:07:38Z W! [outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64

System info

telegraf 1.28.5

Docker

No response

Steps to reproduce

  1. Send some influxdb metrics that include a count field ending in i, in example count=1i
  2. Attempt to export via outputs.opentelemetry
  3. Observe telegraf complain about the incorrect assumption it made with the warning message: "[outputs.opentelemetry] Failed to add point: unsupported histogram count value type int64"

Expected behavior

The metric should be converted to a format that is compatible with opentelemetry's specification and sent to the specified output.

Actual behavior

telegraf freaks out about a type assumption that it made on it's own and refuses to output via opentelemetry

Additional info

I think the main problem is that the logic for assuming a prometheus histogram is faulty. This also came up in: https://github.com/influxdata/telegraf/pull/12431

I would suggest either:

The logic shouldn't be assuming it's a prometheus histogram when the count field is of type int64.

The value should be converted to a supported type automatically

It should be possible in the configuration file to explicitly define the conversion for values of incompatible types

DaveWK avatar Nov 30 '23 20:11 DaveWK

Hi,

Thanks for the report.

telegraf freaks out about a type assumption that it made on it's own and refuses to output via opentelemetry

The input, influxdb listener, does not make any assumptions nor does it set any types on metrics. InfluxDB does not have the concept of types, so everything gets recorded as "untyped" in Telegraf, not "histogram".

That error message is coming from a call to "AddPoint", which is from the influx2otel library. I believe the error returned here in convertHistogramV1. This is only called if the determined type is histogram, which would mean that a) the type was not untyped, cleary wrong, and also ensuring that both the count and sum fields exist, which I am guessing is also not the case.

I think that is actually happening here is our iota's are not lined up between telegraf and the otel library. It looks like Telegraf's Untyped, 2, is the otel library's Histogram.

@jacobmarble is this something you could please chime in on from the otel library perspective?

Thanks!

powersj avatar Nov 30 '23 22:11 powersj