fluent-plugin-prometheus Provide way to remove metric that are not published anymore

It seem currently exporter has no way to stop displaying metrics which are obsolete(never going to be updated). For example in kubernetes cluster we can have thousands of containers recreated every hour. We are collecting fluentd logging stats ( message rate with flowcount plugin) per each container and use prometheus to aggregate this stats. For example we have "fluentd_flowcounter_count_rate" metric and then we use prometheus labels to tag individual container/pod this metric belong to. This works fine, only problem that fluentd prometheus exporter keep showing metrics which are not published for a long time and obsolete (flowcount does not report this logfile stats anymore, log file removed, container deleted). With our rate of deletion/creation of containers output of the prometheus exporter quickly becomes polluted with large amount of obsolete metrics.

Is there a way to make fluent-plugin-prometheus stop publishing idle metrics? Maybe it's possible to introduce extra attribute for the metric to specify idle timeout after which metric will be removed from publishing?

System info: fluentd-0.12.34 'fluent-plugin-prometheus' : '0.3.0'

Fluentd prometheus output configuration:

<match **.log>
  @type copy
  <store>
   type flowcounter
   count_keys *
   unit minute
   aggregate tag
   output_style tagged
   delete_idle true
  </store>
</match>

<filter flowcount>
  @type record_transformer
  enable_ruby true
  remove_keys kubernetes_pod_name,kubernetes_namespace,app,job,instance,pod_template_generation,version

  <record>
    fluentd-tag ${record['tag']}
  </record>
</filter>

<filter flowcount>
  @type prometheus
  <labels>
    tag ${fluentd-tag}
  </labels>
  <metric>
    name fluentd_flowcounter_count_rate
    type gauge
    desc count rate
    key count_rate
  </metric>
</filter>

Output of the exporter: # TYPE fluentd_flowcounter_count_rate gauge # HELP fluentd_flowcounter_count_rate count rate fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.4.log"} 0.1 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.7.log"} 0.05 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.5.log"} 0.16 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.1.log"} 0.13 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.6.log"} 0.08 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.9.log"} 0.05 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.2.log"} 0.13 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.8.log"} 0.15 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.3.log"} 0.18 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.10.log"} 0.11

For example after we delete kubernetes.var.log.containers.test_service.10.log and kubernetes.var.log.containers.test_service.9.log they still will be displayed by exporter even they never will be updated anymore.

Apr 11 '17 17:04 lruslan

Yes, we need to a way to remove old deprecated metrics. There is no way for now.

Apr 29 '17 08:04 kazegusuri

I am also facing the same issue. Prometheus still showing the metrics that are no longer published. Is there a way yet to solve this issue ?

Feb 13 '18 09:02 ghost

+1

Apr 10 '18 08:04 diannaowa

After considering how to implement that, official prometheus client does not have a feature to remove metrics. So please vote for official repository.

Apr 30 '18 11:04 kazegusuri

I have this issue because I'm monitoring VM instances that come and go. My question is why does the plugin hold on to the 'stale' data and present the last event over and over? That also leaves the situation that if the VM dies it looks like it's running in prometheus.

Mar 15 '19 18:03 rickw

Could you add the concept of retention, like in the grok exporter? https://github.com/fstab/grok_exporter/blob/master/CONFIG.md#retention

Dec 09 '19 10:12 lpetre

Any update on this?

Jul 14 '20 08:07 davidone

any update?

Feb 03 '22 12:02 raanand-dig

@kazegusuri or others, have you update on this issue? I reproduce the problem and indeed the metrics sample are not removed. It's problematic with kubernetes ecosystem. Otherwise, have you a workaround?

May 04 '22 10:05 yeplaa

Hi all, I am facing the same issue with fluent-plugin-prometheus: 2.0.2. While the container is deleted and its log file is no longer there, fluentd continues to publish its metric. Has anyone found a workaround at fluentd/promethues side? Any pointers will be appreciated.

Dec 01 '22 15:12 aggarwalShivani

Hi all, I am facing the same issue with fluent-plugin-prometheus: 2.1.2. :( Cant found workaround

Jun 28 '23 15:06 arsbest

Hi, this is indeed a big issue. As @lpetre suggested, adding a retention could be an easy workaround. Is it possible to increase priority on this ticket please?

Jan 09 '24 10:01 amaury-d

fluent-plugin-prometheus fluent-plugin-prometheus copied to clipboard

Provide way to remove metric that are not published anymore

fluent-plugin-prometheus
fluent-plugin-prometheus copied to clipboard