fluent-plugin-prometheus
fluent-plugin-prometheus copied to clipboard
Provide way to remove metric that are not published anymore
It seem currently exporter has no way to stop displaying metrics which are obsolete(never going to be updated). For example in kubernetes cluster we can have thousands of containers recreated every hour. We are collecting fluentd logging stats ( message rate with flowcount plugin) per each container and use prometheus to aggregate this stats. For example we have "fluentd_flowcounter_count_rate" metric and then we use prometheus labels to tag individual container/pod this metric belong to. This works fine, only problem that fluentd prometheus exporter keep showing metrics which are not published for a long time and obsolete (flowcount does not report this logfile stats anymore, log file removed, container deleted). With our rate of deletion/creation of containers output of the prometheus exporter quickly becomes polluted with large amount of obsolete metrics.
Is there a way to make fluent-plugin-prometheus stop publishing idle metrics? Maybe it's possible to introduce extra attribute for the metric to specify idle timeout after which metric will be removed from publishing?
System info: fluentd-0.12.34 'fluent-plugin-prometheus' : '0.3.0'
Fluentd prometheus output configuration:
<match **.log>
@type copy
<store>
type flowcounter
count_keys *
unit minute
aggregate tag
output_style tagged
delete_idle true
</store>
</match>
<filter flowcount>
@type record_transformer
enable_ruby true
remove_keys kubernetes_pod_name,kubernetes_namespace,app,job,instance,pod_template_generation,version
<record>
fluentd-tag ${record['tag']}
</record>
</filter>
<filter flowcount>
@type prometheus
<labels>
tag ${fluentd-tag}
</labels>
<metric>
name fluentd_flowcounter_count_rate
type gauge
desc count rate
key count_rate
</metric>
</filter>
Output of the exporter: # TYPE fluentd_flowcounter_count_rate gauge # HELP fluentd_flowcounter_count_rate count rate fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.4.log"} 0.1 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.7.log"} 0.05 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.5.log"} 0.16 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.1.log"} 0.13 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.6.log"} 0.08 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.9.log"} 0.05 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.2.log"} 0.13 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.8.log"} 0.15 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.3.log"} 0.18 fluentd_flowcounter_count_rate{tag="kubernetes.var.log.containers.test_service.10.log"} 0.11
For example after we delete kubernetes.var.log.containers.test_service.10.log and kubernetes.var.log.containers.test_service.9.log they still will be displayed by exporter even they never will be updated anymore.
Yes, we need to a way to remove old deprecated metrics. There is no way for now.
I am also facing the same issue. Prometheus still showing the metrics that are no longer published. Is there a way yet to solve this issue ?
+1
After considering how to implement that, official prometheus client does not have a feature to remove metrics. So please vote for official repository.
I have this issue because I'm monitoring VM instances that come and go. My question is why does the plugin hold on to the 'stale' data and present the last event over and over? That also leaves the situation that if the VM dies it looks like it's running in prometheus.
Could you add the concept of retention, like in the grok exporter? https://github.com/fstab/grok_exporter/blob/master/CONFIG.md#retention
Any update on this?
any update?
@kazegusuri or others, have you update on this issue? I reproduce the problem and indeed the metrics sample are not removed. It's problematic with kubernetes ecosystem. Otherwise, have you a workaround?
Hi all, I am facing the same issue with fluent-plugin-prometheus: 2.0.2. While the container is deleted and its log file is no longer there, fluentd continues to publish its metric. Has anyone found a workaround at fluentd/promethues side? Any pointers will be appreciated.
Hi all, I am facing the same issue with fluent-plugin-prometheus: 2.1.2. :( Cant found workaround
Hi, this is indeed a big issue. As @lpetre suggested, adding a retention could be an easy workaround. Is it possible to increase priority on this ticket please?