fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

opentelemetry metrics are not decoded / encoded as expected

Open mrugeshmaster opened this issue 1 year ago • 3 comments

Bug Report

Describe the bug I am using OpenTelemetry Manual Instrumentation in JAVA application to create metrics. Application is running on a Kubernetes Cluster. I am using fluent-bit daemon-set as a middleware to collect these metrics and forward them to an otel-collector deployed outside the cluster using opentelemetry input / output plugin. Fluent-bit version is 2.1.9.

All the metrics have a prefix _ added by fluent-bit. The counter metrics have value of e-312 instead of actual counter value.

If I forward them using Otel-collector instead of fluent-bit, I do not see suffix _ and the counters are incremented as expected.

To Reproduce

  • Example log message:
2023-10-23T23:18:33.713354000Z _process_start_time_seconds = 8.3871636015165086e-312
2023-10-23T23:18:33.713354000Z _process_cpu_seconds_total = 1.214484996996431e-310
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.max.bytes{area="heap",pool="G1 Old Gen"} = 4294967296
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.max.bytes{area="non_heap",pool="Compressed Class Space"} = 1073741824
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.max.bytes{area="non_heap",pool="CodeHeap 'non-profiled nmethods'"} = 122916864
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.max.bytes{area="non_heap",pool="CodeHeap 'non-nmethods'"} = 5828608
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.max.bytes{area="non_heap",pool="CodeHeap 'profiled nmethods'"} = 122912768
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.init.bytes{area="heap",pool="G1 Old Gen"} = 1017118720
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.init.bytes{area="non_heap",pool="Compressed Class Space"} = 0
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.init.bytes{area="non_heap",pool="CodeHeap 'non-profiled nmethods'"} = 2555904
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.init.bytes{area="non_heap",pool="CodeHeap 'non-nmethods'"} = 2555904
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.init.bytes{area="heap",pool="G1 Eden Space"} = 56623104
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.init.bytes{area="non_heap",pool="Metaspace"} = 0
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.init.bytes{area="non_heap",pool="CodeHeap 'profiled nmethods'"} = 2555904
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.init.bytes{area="heap",pool="G1 Survivor Space"} = 0
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.collection.used.bytes{area="heap",pool="G1 Old Gen"} = 0
2023-10-23T23:18:33.713354000Z _jvm.memory.pool.collection.used.bytes{area="heap",pool="G1 Eden Space"} = 0
  • Steps to reproduce the problem:
  • Add metrics to application using OpenTelemetry Manual Instrumentation.
  • Deploy fluent-bit daemon-set
  • Add fluent-bit service endpoint in OpenTelemetry Http Metric Exporter
  • Add opentelemetry input plugin in fluent-bit to ingest the metrics

Expected behavior Metrics should not have prefix _ and counter instruments should increment as integers [0,1,2,3,...,n].

Screenshots

Your Environment

  • Version used: 2.1.9

  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes - v1.26.5

  • Filters and plugins: fluent-bit.conf: |- [SERVICE] Flush 1 Daemon Off Log_Level debug Parsers_File parsers.conf Parsers_File custom_parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020

    [INPUT] Name opentelemetry Host 0.0.0.0 Port 4318 Tag otel

    [OUTPUT] Name stdout Match otel

Additional context

I am generating custom metrics in Java application using OpenTelemetry. I am using fluent-bit as middleware to ingest the metrics and forward it to Otel-collector / Prometheus-server located outside the environment.

mrugeshmaster avatar Oct 23 '23 23:10 mrugeshmaster

I have this issue in my environment as well. All values are negative infinity for every metric. If I go straight to the collector, I don't have this issue. My environment and configuration is nearly identical to the reporter.

Version used: 2.2.0

Deployment: Kubernetes

Configuration:

service: | [SERVICE] Flush 1 Log_Level info HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 inputs: | [INPUT] name opentelemetry listen 0.0.0.0 port 4318 Tag metrics

outputs: | [OUTPUT] Name opentelemetry Match * Host Port 443 Log_response_payload True metrics_uri /v1/metrics logs_uri /v1/logs traces_uri /v1/traces Tls On Tls.verify Off compress gzip [OUTPUT] Name stdout Match *

2023-11-14T19:28:51.936224000Z _ci.pipeline.run.completed = 4.9406564584124654e-324
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.max{pool="G1 Old Gen",type="heap"} = 2.1219957909652723e-314
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.max{pool="CodeHeap 'non-profiled nmethods'",type="non_heap"} = 6.0728999796940667e-316
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.max{pool="Compressed Class Space",type="non_heap"} = 5.3049894774131808e-315
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.max{pool="CodeHeap 'non-nmethods'",type="non_heap"} = 2.8797149758754563e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.max{pool="CodeHeap 'profiled nmethods'",type="non_heap"} = 6.0726976104055301e-316
2023-11-14T19:28:51.936224000Z _runtime.jvm.gc.count{gc="G1 Old Generation"} = 0
2023-11-14T19:28:51.936224000Z _runtime.jvm.gc.count{gc="G1 Young Generation"} = 3.9031186021458477e-322
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.init{pool="G1 Old Gen",type="heap"} = 2.010093669176088e-314
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.init{pool="CodeHeap 'non-profiled nmethods'",type="non_heap"} = 1.2627843604682254e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.init{pool="G1 Eden Space",type="heap"} = 1.1190212178918428e-315
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.init{pool="Compressed Class Space",type="non_heap"} = 0
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.init{pool="Metaspace",type="non_heap"} = 0
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.init{pool="G1 Survivor Space",type="heap"} = 0
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.init{pool="CodeHeap 'non-nmethods'",type="non_heap"} = 1.2627843604682254e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.init{pool="CodeHeap 'profiled nmethods'",type="non_heap"} = 1.2627843604682254e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.usage{pool="G1 Old Gen",type="heap"} = 3.7305941987392245e-315
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.usage{pool="CodeHeap 'non-profiled nmethods'",type="non_heap"} = 4.8705228518540088e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.usage{pool="G1 Eden Space",type="heap"} = 8.2994073660311676e-315
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.usage{pool="Compressed Class Space",type="non_heap"} = 8.3767012090806656e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.usage{pool="Metaspace",type="non_heap"} = 6.6170575579833489e-316
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.usage{pool="G1 Survivor Space",type="heap"} = 6.2167845438435712e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.usage{pool="CodeHeap 'non-nmethods'",type="non_heap"} = 1.2756221622097644e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.usage{pool="CodeHeap 'profiled nmethods'",type="non_heap"} = 1.549940056861358e-316
2023-11-14T19:28:51.936224000Z _ci.pipeline.run.started = 4.9406564584124654e-324
2023-11-14T19:28:51.936224000Z _runtime.jvm.gc.time{gc="G1 Old Generation"} = 0
2023-11-14T19:28:51.936224000Z _runtime.jvm.gc.time{gc="G1 Young Generation"} = 1.7025502155689356e-320
2023-11-14T19:28:51.936224000Z _jenkins.queue.time_spent_millis = 3.6788127989339218e-320
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.committed{pool="G1 Old Gen",type="heap"} = 7.853871140389045e-315
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.committed{pool="CodeHeap 'non-profiled nmethods'",type="non_heap"} = 4.889242011043642e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.committed{pool="G1 Eden Space",type="heap"} = 1.3303918923825242e-314
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.committed{pool="Compressed Class Space",type="non_heap"} = 1.018929367781653e-316
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.committed{pool="Metaspace",type="non_heap"} = 7.135945852376693e-316
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.committed{pool="G1 Survivor Space",type="heap"} = 6.2167845438435712e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.committed{pool="CodeHeap 'non-nmethods'",type="non_heap"} = 1.3275425327999293e-317
2023-11-14T19:28:51.936224000Z _process.runtime.jvm.memory.committed{pool="CodeHeap 'profiled nmethods'",type="non_heap"} = 1.5509582273443076e-316
2023-11-14T19:28:51.936224000Z _ci.pipeline.run.launched = 4.9406564584124654e-324
2023-11-14T19:28:51.936224000Z _jenkins.scm.event.pool_size = 0
2023-11-14T19:28:51.936224000Z _jenkins.scm.event.active_threads = 0
2023-11-14T19:28:51.936224000Z _process.cpu.time = 6.9984398733412573e-313
2023-11-14T19:28:51.936224000Z _jenkins.scm.event.completed_tasks = 0
2023-11-14T19:28:51.936224000Z _jenkins.queue.left = 4.9406564584124654e-324
2023-11-14T19:28:51.936224000Z _jenkins.scm.event.queued_tasks = 0
2023-11-14T19:28:51.936224000Z _jenkins.queue.buildable = 0
2023-11-14T19:28:51.936224000Z _system.paging.utilization = 0
2023-11-14T19:28:51.936224000Z _system.memory.usage{state="free"} = 1.2818394526768293e-314
2023-11-14T19:28:51.936224000Z _system.memory.usage{state="used"} = 2.9621521292537154e-314
2023-11-14T19:28:51.936224000Z _jenkins.queue.blocked = 0
2023-11-14T19:28:51.936224000Z _jenkins.agents.total = 5.9287877500949585e-323
2023-11-14T19:28:51.936224000Z _system.cpu.load.average.1m = 8.5700000000000003
2023-11-14T19:28:51.936224000Z _process.cpu.load = 0.64503642987249543
2023-11-14T19:28:51.936224000Z _jenkins.agents.online = 4.9406564584124654e-324
2023-11-14T19:28:51.936224000Z _jenkins.agents.offline = 5.434722104253712e-323
2023-11-14T19:28:51.936224000Z _system.cpu.load = 0.30422213217391303
2023-11-14T19:28:51.936224000Z _jenkins.queue.waiting = 0
2023-11-14T19:28:51.936224000Z _system.paging.usage{state="free"} = 0
2023-11-14T19:28:51.936224000Z _system.paging.usage{state="used"} = 0
2023-11-14T19:28:51.936224000Z _system.memory.utilization = 0.69796371459960938

aidanleuck avatar Nov 14 '23 17:11 aidanleuck

We also experience the same issue with data received from OTLP.

danifv avatar Feb 05 '24 14:02 danifv

any update here ?

mrugeshmaster avatar Mar 13 '24 22:03 mrugeshmaster

The math fix seems to work for opentelemetry metrics. I am able to see the metrics as expected using fluent-bit v3.0.3 . Thank you for working on this bug. Closing the bug

mrugeshmaster avatar May 23 '24 17:05 mrugeshmaster