fluent-plugin-prometheus
fluent-plugin-prometheus copied to clipboard
LabelSetValidator::InvalidLabelSetError on passing metrics from fluentd
I have set multible labels on matching tag in fluentd configuration file. These keys might or might not be present in the incoming logs. I am getting the following in td-agent.log:
2018-07-10 12:53:00 +0000 [warn]: unknown placeholder ${u} found
2018-07-10 12:53:00 +0000 [warn]: unknown placeholder ${bid} found
2018-07-10 12:53:00 +0000 [warn]: unknown placeholder ${cnt} found
2018-07-10 12:53:00 +0000 [warn]: unknown placeholder ${curr} found
2018-07-10 12:53:00 +0000 [warn]: prometheus: failed to instrument a metric. error_class=Prometheus::Client::LabelSetValidator::InvalidLabelSetError error=#<Prometheus::Client::LabelSetValidator::I
nvalidLabelSetError: labels must have the same signature> tag="Tag1" name="fluentd_output_status_num_records_total"
2018-07-10 12:53:00 +0000 [warn]: dump an error event: error_class=Prometheus::Client::LabelSetValidator::InvalidLabelSetError error="labels must have the same signature" tag="Tag1" t
ime=1531227180 record={"name"=>"Tag1", "pid"=>11289, "level"=>50, "c"=>"client", "err"=>"BrokerNotAvailableError: Broker
not available", "s"=>"Unsubscribe", "tag"=>"kafka-failure", "msg"=>"Broker not available", "time"=>{}, "v"=>0}
Configuration for the corresponding tag looks like this:
<match Tag1.**>
@type prometheus
<metric>
name fluentd_output_status_num_records_total
type counter
desc The total number of outgoing records
<labels>
level ${level}
error ${err}
user ${u}
client ${c}
batchID ${bid}
count ${cnt}
sessions ${curr}
tag ${tag}
hostname ${hostname}
</labels>
</metric>
</match>
Also, I am not able to see the corresponding metrics in prometheus. It looks like this error is blocking it.
Also, my logs look like this: 2018-07-10T12:50:43Z Tag1 {"name":"Tag1","pid":11289,"level":50,"c":"client","err":"BrokerNotAvailableError: Broker not available","s":"Unsubscribe","tag":"kafka-failure","msg":"Broker not available","time":{},"v":0}
+1, same problem here using Fluentd (v1.3.2) and fluent-plugin-prometheus (v1.6.0) in Kubernetes (v1.14).
Fluentd configuration extract :
apiVersion: v2
kind: ConfigMap
metadata:
name: fluentd-opa
namespace: opa
data:
fluent.conf: |
# get logs from /var/log/containers/
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag kubernetes.*
format json
read_from_head true
</source>
# instrument metrics from records
# no impact against values of each records
<filter kubernetes.var.log.containers.**opa**.log>
@type prometheus
<labels>
status $.resp_status
</labels>
<metric>
name opa_decisions_total
type counter
desc The total number of OPA decisions.
# No key means increment counter for each record
</metric>
</filter>
# output plugin for prometheus type
<match kubernetes.var.log.containers.**opa**.log>
@type copy
<store>
@type prometheus
<metric>
name opa_decisions_total
type counter
desc The total number of OPA decisions.
# No key means increment counter for each record
</metric>
</store>
<store>
@type stdout
</store>
</match>
# provides a metrics HTTP endpoint to be scraped by a Prometheus server
# expose custom and default on container localhost
<source>
@type prometheus
bind 0.0.0.0
port 24224
metrics_path /metrics
</source>
Logs to parse :
{
"log":"{
"client_addr":"10.233.64.1:38700",
"level":"info",
"msg":"Sent response.",
"req_id":12697,
"req_method":"POST",
"req_path":"/",
"resp_body":"{
"apiVersion":"admission.k8s.io/v1beta1",
"kind":"AdmissionReview";
"response":{
"allowed":true
}
}",
"resp_bytes":94,
"resp_duration":3.421389,
"resp_status":200,
"time":"2019-09-20T13:40:11Z"
}",
"stream":"stderr"
}
Fluentd logs extract :
2019-10-01 08:32:36 +0000 [warn]: #0 prometheus: failed to instrument a metric. error_class=Prometheus::Client::LabelSetValidator::InvalidLabelSetError error=#<Prometheus::Client::LabelSetValidator::InvalidLabelSetError: labels must have the same signature (keys given: [] vs. keys expected: [:status]> tag="kubernetes.var.log.containers.opa-6849476f49-sc5qs_opa_opa-4d14b124df6764056629cd0b5e50ef7fea92fa9391ed031cb825549d3b74389f.log" name="opa_decisions_total"
However I am able to see the metric opa_decisions_total, but not sure of what it counts ...
curl -ks http://<endpoint-IP>:<endpoint-port>/metrics gives
# TYPE opa_decisions_total counter
# HELP opa_decisions_total The total number of OPA decisions.
opa_decisions_total{status=""} 5462.0
Any idea ? Thanks
worst workaround possible, but it works.
leverage tags.
<match nginx.access.base>
@type route
<route **>
add_tag_prefix storage
copy
</route>
<route **>
add_tag_prefix metrics
copy
</route>
</match>
first copy stream, so you can do terrible things. then scrub the value so that you don't have your separator '.' in the value
<filter metrics.nginx.access.base>
@type record_transformer
auto_typecast
enable_ruby
<record>
domainclean ${ record["domain"].gsub('(http)(s?)(://)',"").gsub('.',"-") }
</record>
</filter>
then append the new clean value to the tag
<match metrics.nginx.access.base>
@type rewrite_tag_filter
<rule>
key domainclean
pattern ^(.+)$
tag https.${record['domainclean']}
</rule>
</match>
then write to prometheus and storage in two separate blocks.
<match https.**>
@type prometheus
<metric>
name fluentd_output_status_num_records_total
type counter
desc The total number of outgoing records
<labels>
tag ${tag}
hostname ${hostname}
logname access_base
</labels>
</metric>
</match>
then the grafana templating funs start.
query for template variable
label_values(fluentd_output_status_num_records_total{instance_name=~"app-prod-events.*",tag=~"http.*"},tag)
regex to filter tag
/https\.([^.]+)\..*/
which returns first "value" after 'https.'
/https\.[^.]+\.([^.]+)\..*/
which returns second "value" after 'https.'
then turn on Value groups/tags (Experimental feature)
and put in the tag query of
label_values(tag)
also you can use label_replace...
sum by (domain_group) (
label_replace(
label_replace(rate(fluentd_output_status_num_records_total{instance_name=~"app-server-.*",tag=~"https.*"}[30s]), "domain_group", "$1", "tag", ".+"),
"domain_group", "$1", "tag", `https\.([^\.]*)\..*`
)
)
it's ugly but works.