fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

in_premetheus_remote_write: Implement handler of payloads of prometheus remote write protocol

Open cosmo0920 opened this issue 2 months ago • 4 comments


Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [x] Example configuration file for the change
$  bin/fluent-bit -i prometheus_remote_write -pport=8080 -phttp2=off -o stdout -v
  • [x] Debug log output from testing the change
Fluent Bit v3.0.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________  
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \ 
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  < 
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/ 

[2024/04/19 20:24:33] [ info] Configuration:
[2024/04/19 20:24:33] [ info]  flush time     | 1.000000 seconds
[2024/04/19 20:24:33] [ info]  grace          | 5 seconds
[2024/04/19 20:24:33] [ info]  daemon         | 0
[2024/04/19 20:24:33] [ info] ___________
[2024/04/19 20:24:33] [ info]  inputs:
[2024/04/19 20:24:33] [ info]      prometheus_remote_write
[2024/04/19 20:24:33] [ info] ___________
[2024/04/19 20:24:33] [ info]  filters:
[2024/04/19 20:24:33] [ info] ___________
[2024/04/19 20:24:33] [ info]  outputs:
[2024/04/19 20:24:33] [ info]      stdout.0
[2024/04/19 20:24:33] [ info] ___________
[2024/04/19 20:24:33] [ info]  collectors:
[2024/04/19 20:24:33] [ info] [fluent bit] version=3.0.3, commit=ec1c65f73a, pid=186265
[2024/04/19 20:24:33] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2024/04/19 20:24:33] [ info] [storage] ver=1.1.6, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/04/19 20:24:33] [ info] [cmetrics] version=0.7.3
[2024/04/19 20:24:33] [ info] [ctraces ] version=0.4.0
[2024/04/19 20:24:33] [ info] [input:prometheus_remote_write:prometheus_remote_write.0] initializing
[2024/04/19 20:24:33] [ info] [input:prometheus_remote_write:prometheus_remote_write.0] storage_strategy='memory' (memory only)
[2024/04/19 20:24:33] [ info] [output:stdout:stdout.0] worker #0 started
[2024/04/19 20:24:33] [debug] [prometheus_remote_write:prometheus_remote_write.0] created event channels: read=21 write=22
[2024/04/19 20:24:33] [debug] [downstream] listening on 0.0.0.0:8080
[2024/04/19 20:24:33] [ info] [input:prometheus_remote_write:prometheus_remote_write.0] listening on 0.0.0.0:8080
[2024/04/19 20:24:33] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/04/19 20:24:33] [ info] [sp] stream processor started
[2024/04/19 20:24:38] [debug] [task] created task=0x5faf030 id=0 OK
2024-04-19T11:24:36.216000000Z fluentbit_uptime{__name__="fluentbit_uptime",hostname="cosmo-desktop2"} = 1
2024-04-19T11:24:35.170000000Z fluentbit_input_bytes_total{__name__="fluentbit_input_bytes_total",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_records_total{__name__="fluentbit_input_records_total",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:36.216000000Z fluentbit_input_metrics_scrapes_total{__name__="fluentbit_input_metrics_scrapes_total",name="fluentbit_metrics.0"} = 1
2024-04-19T11:24:35.170000000Z fluentbit_output_proc_records_total{__name__="fluentbit_output_proc_records_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_proc_bytes_total{__name__="fluentbit_output_proc_bytes_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_errors_total{__name__="fluentbit_output_errors_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_retries_total{__name__="fluentbit_output_retries_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_retries_failed_total{__name__="fluentbit_output_retries_failed_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_dropped_records_total{__name__="fluentbit_output_dropped_records_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_retried_records_total{__name__="fluentbit_output_retried_records_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:36.216000000Z fluentbit_process_start_time_seconds{__name__="fluentbit_process_start_time_seconds",hostname="cosmo-desktop2"} = 1713525875
2024-04-19T11:24:36.216000000Z fluentbit_build_info{__name__="fluentbit_build_info",hostname="cosmo-desktop2",version="3.0.3",os="linux"} = 1713525875
2024-04-19T11:24:36.216000000Z fluentbit_hot_reloaded_times{__name__="fluentbit_hot_reloaded_times",hostname="cosmo-desktop2"} = 0
2024-04-19T11:24:36.216000000Z fluentbit_storage_chunks{__name__="fluentbit_storage_chunks"} = 0
2024-04-19T11:24:36.216000000Z fluentbit_storage_mem_chunks{__name__="fluentbit_storage_mem_chunks"} = 0
2024-04-19T11:24:36.216000000Z fluentbit_storage_fs_chunks{__name__="fluentbit_storage_fs_chunks"} = 0
2024-04-19T11:24:36.216000000Z fluentbit_storage_fs_chunks_up{__name__="fluentbit_storage_fs_chunks_up"} = 0
2024-04-19T11:24:36.216000000Z fluentbit_storage_fs_chunks_down{__name__="fluentbit_storage_fs_chunks_down"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_ingestion_paused{__name__="fluentbit_input_ingestion_paused",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_overlimit{__name__="fluentbit_input_storage_overlimit",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_memory_bytes{__name__="fluentbit_input_storage_memory_bytes",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks{__name__="fluentbit_input_storage_chunks",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks_up{__name__="fluentbit_input_storage_chunks_up",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks_down{__name__="fluentbit_input_storage_chunks_down",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks_busy{__name__="fluentbit_input_storage_chunks_busy",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks_busy_bytes{__name__="fluentbit_input_storage_chunks_busy_bytes",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_upstream_total_connections{__name__="fluentbit_output_upstream_total_connections",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_upstream_busy_connections{__name__="fluentbit_output_upstream_busy_connections",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_chunk_available_capacity_percent{__name__="fluentbit_output_chunk_available_capacity_percent",name="prometheus_remote_write.0"} = 100
[2024/04/19 20:24:38] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2024/04/19 20:24:38] [debug] [out flush] cb_destroy coro_id=0
[2024/04/19 20:24:38] [debug] [task] destroy task=0x5faf030 (task_id=0)
2024-04-19T11:24:38.216000000Z fluentbit_uptime{__name__="fluentbit_uptime",hostname="cosmo-desktop2"} = 3
2024-04-19T11:24:35.170000000Z fluentbit_input_bytes_total{__name__="fluentbit_input_bytes_total",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_records_total{__name__="fluentbit_input_records_total",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:38.216000000Z fluentbit_input_metrics_scrapes_total{__name__="fluentbit_input_metrics_scrapes_total",name="fluentbit_metrics.0"} = 2
2024-04-19T11:24:37.278000000Z fluentbit_output_proc_records_total{__name__="fluentbit_output_proc_records_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:37.278000000Z fluentbit_output_proc_bytes_total{__name__="fluentbit_output_proc_bytes_total",name="prometheus_remote_write.0"} = 6175
2024-04-19T11:24:35.170000000Z fluentbit_output_errors_total{__name__="fluentbit_output_errors_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_retries_tota[2024/04/19 20:24:39] [debug] [task] created task=0x6180f00 id=0 OK
l{__name__="fluentbit_output_retries_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_retries_failed_total{__name__="fluentbit_output_retries_failed_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_dropped_records_total{__name__="fluentbit_output_dropped_records_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_output_retried_records_total{__name__="fluentbit_output_retried_records_total",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:38.216000000Z fluentbit_process_start_time_seconds{__name__="fluentbit_process_start_time_seconds",hostname="cosmo-desktop2"} = 1713525875
2024-04-19T11:24:38.216000000Z fluentbit_build_info{__name__="fluentbit_build_info",hostname="cosmo-desktop2",version="3.0.3",os="linux"} = 1713525875
2024-04-19T11:24:38.216000000Z fluentbit_hot_reloaded_times{__name__="fluentbit_hot_reloaded_times",hostname="cosmo-desktop2"} = 0
2024-04-19T11:24:38.216000000Z fluentbit_storage_chunks{__name__="fluentbit_storage_chunks"} = 0
2024-04-19T11:24:38.216000000Z fluentbit_storage_mem_chunks{__name__="fluentbit_storage_mem_chunks"} = 0
2024-04-19T11:24:38.216000000Z fluentbit_storage_fs_chunks{__name__="fluentbit_storage_fs_chunks"} = 0
2024-04-19T11:24:38.216000000Z fluentbit_storage_fs_chunks_up{__name__="fluentbit_storage_fs_chunks_up"} = 0
2024-04-19T11:24:38.216000000Z fluentbit_storage_fs_chunks_down{__name__="fluentbit_storage_fs_chunks_down"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_ingestion_paused{__name__="fluentbit_input_ingestion_paused",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_overlimit{__name__="fluentbit_input_storage_overlimit",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_memory_bytes{__name__="fluentbit_input_storage_memory_bytes",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks{__name__="fluentbit_input_storage_chunks",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks_up{__name__="fluentbit_input_storage_chunks_up",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks_down{__name__="fluentbit_input_storage_chunks_down",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks_busy{__name__="fluentbit_input_storage_chunks_busy",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:35.170000000Z fluentbit_input_storage_chunks_busy_bytes{__name__="fluentbit_input_storage_chunks_busy_bytes",name="fluentbit_metrics.0"} = 0
2024-04-19T11:24:37.216000000Z fluentbit_output_upstream_total_connections{__name__="fluentbit_output_upstream_total_connections",name="prometheus_remote_write.0"} = 1
2024-04-19T11:24:37.278000000Z fluentbit_output_upstream_busy_connections{__name__="fluentbit_output_upstream_busy_connections",name="prometheus_remote_write.0"} = 0
2024-04-19T11:24:37.278000000Z fluentbit_output_chunk_available_capacity_percent{__name__="fluentbit_output_chunk_available_capacity_percent",name="prometheus_remote_write.0"} = 100
[2024/04/19 20:24:39] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2024/04/19 20:24:39] [debug] [task] destroy task=0x6180f00 (task_id=0)
[2024/04/19 20:24:39] [debug] [out flush] cb_destroy coro_id=1
^C[2024/04/19 20:24:40] [engine] caught signal (SIGINT)
[2024/04/19 20:24:40] [ warn] [engine] service will shutdown in max 5 seconds
[2024/04/19 20:24:40] [ info] [engine] service has stopped (0 pending tasks)
[2024/04/19 20:24:40] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2024/04/19 20:24:40] [ info] [output:stdout:stdout.0] thread worker #0 stopped
  • [x] Attached Valgrind output that shows no leaks or memory corruption was found
==186265== 
==186265== HEAP SUMMARY:
==186265==     in use at exit: 0 bytes in 0 blocks
==186265==   total heap usage: 7,158 allocs, 7,158 frees, 3,974,699 bytes allocated
==186265== 
==186265== All heap blocks were freed -- no leaks are possible
==186265== 
==186265== For lists of detected and suppressed errors, rerun with: -s
==186265== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [ ] Run local packaging test showing all targets (including any new ones) build.
  • [ ] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [x] Documentation required for this feature

https://github.com/fluent/fluent-bit-docs/pull/1363

Backporting

  • [ ] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

cosmo0920 avatar Apr 17 '24 05:04 cosmo0920

@cosmo0920 I see the metrics generated have an _ as a prefix, e.g:

2024-04-17T05:13:34.937000000Z _fluentbit_uptime{__name__="fluentbit_uptime",hostname="cosmo-desktop2"} = 1

maybe is something around the definition of the metrics name,subsystem,etc ?

edsiper avatar Apr 18 '24 19:04 edsiper

@cosmo0920 I see the metrics generated have an _ as a prefix, e.g:

2024-04-17T05:13:34.937000000Z _fluentbit_uptime{__name__="fluentbit_uptime",hostname="cosmo-desktop2"} = 1

maybe is something around the definition of the metrics name,subsystem,etc ?

This could be generated from cmetrics but I'm not sure why the reason. I just following how to decode on the open telemetry payloads.

cosmo0920 avatar Apr 19 '24 02:04 cosmo0920

I got it.

@cosmo0920 I see the metrics generated have an _ as a prefix, e.g:

2024-04-17T05:13:34.937000000Z _fluentbit_uptime{__name__="fluentbit_uptime",hostname="cosmo-desktop2"} = 1

maybe is something around the definition of the metrics name,subsystem,etc ?

This could be generated from cmetrics but I'm not sure why the reason. I just following how to decode on the open telemetry payloads.

This was generated from cmetrics' msgpack decoder: https://github.com/fluent/cmetrics/pull/201/commits/8cb2739af017cd9d67f9f86b1c784487f4c6a13b

cosmo0920 avatar Apr 19 '24 11:04 cosmo0920

I also confirmed with prometheus' remote write settings and node_exporter:

$ bin/fluent-bit -i prometheus_remote_write -pport=8080 -puri=/api/prom/push -phttp2=off -o stdout -v
$ cat prometheus_remote_write.yml
# prometheus global config
global:
  scrape_interval:  5s
  external_labels:
    environment: dev

scrape_configs:
- job_name: node
  static_configs:
  - targets: ['localhost:9100']

remote_write:
- url: http://localhost:8080/api/prom/push

And starting prometheus instance with:

$ ./prometheus --config.file=prometheus_remote_write.yml

Just starting node_exporter:

$ ./node_exporter

These command can ingest remote_write payloads that use the protocol of prometheus remote write.

Fluent Bit v3.0.4
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________  
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \ 
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  < 
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/ 

[2024/04/30 15:21:18] [ info] Configuration:
[2024/04/30 15:21:18] [ info]  flush time     | 1.000000 seconds
[2024/04/30 15:21:18] [ info]  grace          | 5 seconds
[2024/04/30 15:21:18] [ info]  daemon         | 0
[2024/04/30 15:21:18] [ info] ___________
[2024/04/30 15:21:18] [ info]  inputs:
[2024/04/30 15:21:18] [ info]      prometheus_remote_write
[2024/04/30 15:21:18] [ info] ___________
[2024/04/30 15:21:18] [ info]  filters:
[2024/04/30 15:21:18] [ info] ___________
[2024/04/30 15:21:18] [ info]  outputs:
[2024/04/30 15:21:18] [ info]      stdout.0
[2024/04/30 15:21:18] [ info] ___________
[2024/04/30 15:21:18] [ info]  collectors:
[2024/04/30 15:21:18] [ info] [fluent bit] version=3.0.4, commit=98eb7da63a, pid=288363
[2024/04/30 15:21:18] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2024/04/30 15:21:18] [ info] [storage] ver=1.1.6, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/04/30 15:21:18] [ info] [cmetrics] version=0.9.0
[2024/04/30 15:21:18] [ info] [ctraces ] version=0.5.1
[2024/04/30 15:21:18] [ info] [input:prometheus_remote_write:prometheus_remote_write.0] initializing
[2024/04/30 15:21:18] [ info] [input:prometheus_remote_write:prometheus_remote_write.0] storage_strategy='memory' (memory only)
[2024/04/30 15:21:18] [debug] [prometheus_remote_write:prometheus_remote_write.0] created event channels: read=21 write=22
[2024/04/30 15:21:18] [debug] [downstream] listening on 0.0.0.0:8080
[2024/04/30 15:21:18] [ info] [input:prometheus_remote_write:prometheus_remote_write.0] listening on 0.0.0.0:8080
[2024/04/30 15:21:18] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/04/30 15:21:18] [ info] [sp] stream processor started
[2024/04/30 15:21:18] [ info] [output:stdout:stdout.0] worker #0 started
[2024/04/30 15:21:41] [debug] [task] created task=0x8e7e770 id=0 OK
[2024/04/30 15:21:41] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
2024-04-30T06:21:26.551000000Z go_gc_duration_seconds{__name__="go_gc_duration_seconds",environment="dev",instance="localhost:9100",job="node",quantile="0"} = 0
2024-04-30T06:21:26.551000000Z go_gc_duration_seconds{__name__="go_gc_duration_seconds",environment="dev",instance="localhost:9100",job="node",quantile="0.25"} = 0
2024-04-30T06:21:26.551000000Z go_gc_duration_seconds{__name__="go_gc_duration_seconds",environment="dev",instance="localhost:9100",job="node",quantile="0.5"} = 0
2024-04-30T06:21:26.551000000Z go_gc_duration_seconds{__name__="go_gc_duration_seconds",environment="dev",instance="localhost:9100",job="node",quantile="0.75"} = 0
2024-04-30T06:21:26.551000000Z go_gc_duration_seconds{__name__="go_gc_duration_seconds",environment="dev",instance="localhost:9100",job="node",quantile="1"} = 0
2024-04-30T06:21:26.551000000Z go_gc_duration_seconds_sum{__name__="go_gc_duration_seconds_sum",environment="dev",instance="localhost:9100",job="node"} = 0
2024-04-30T06:21:26.551000000Z go_gc_duration_seconds_count{__name__="go_gc_duration_seconds_count",environment="dev",instance="localhost:9100",job="node"} = 0
2024-04-30T06:21:26.551000000Z go_goroutines{__name__="go_goroutines",environment="dev",instance="localhost:9100",job="node"} = 7
2024-04-30T06:21:26.551000000Z go_info{__name__="go_info",environment="dev",instance="localhost:9100",job="node",version="go1.21.1"} = 1
2024-04-30T06:21:26.551000000Z go_memstats_alloc_bytes{__name__="go_memstats_alloc_bytes",environment="dev",instance="localhost:9100",job="node"
<snip>
[2024/04/30 15:21:57] [debug] [out flush] cb_destroy coro_id=2
[2024/04/30 15:21:57] [debug] [task] destroy task=0x89656f0 (task_id=0)
[2024/04/30 15:21:58] [engine] caught signal (SIGINT)
[2024/04/30 15:21:58] [ warn] [engine] service will shutdown in max 5 seconds
[2024/04/30 15:21:59] [ info] [engine] service has stopped (0 pending tasks)
[2024/04/30 15:21:59] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2024/04/30 15:21:59] [ info] [output:stdout:stdout.0] thread worker #0 stopped

cosmo0920 avatar Apr 30 '24 06:04 cosmo0920