fluent-plugin-prometheus icon indicating copy to clipboard operation
fluent-plugin-prometheus copied to clipboard

LabelSetValidator::InvalidLabelSetError on passing metrics from fluentd

Open ArunDahiya1 opened this issue 7 years ago • 3 comments

I have set multible labels on matching tag in fluentd configuration file. These keys might or might not be present in the incoming logs. I am getting the following in td-agent.log: 2018-07-10 12:53:00 +0000 [warn]: unknown placeholder ${u} found 2018-07-10 12:53:00 +0000 [warn]: unknown placeholder ${bid} found 2018-07-10 12:53:00 +0000 [warn]: unknown placeholder ${cnt} found 2018-07-10 12:53:00 +0000 [warn]: unknown placeholder ${curr} found 2018-07-10 12:53:00 +0000 [warn]: prometheus: failed to instrument a metric. error_class=Prometheus::Client::LabelSetValidator::InvalidLabelSetError error=#<Prometheus::Client::LabelSetValidator::I nvalidLabelSetError: labels must have the same signature> tag="Tag1" name="fluentd_output_status_num_records_total" 2018-07-10 12:53:00 +0000 [warn]: dump an error event: error_class=Prometheus::Client::LabelSetValidator::InvalidLabelSetError error="labels must have the same signature" tag="Tag1" t ime=1531227180 record={"name"=>"Tag1", "pid"=>11289, "level"=>50, "c"=>"client", "err"=>"BrokerNotAvailableError: Broker not available", "s"=>"Unsubscribe", "tag"=>"kafka-failure", "msg"=>"Broker not available", "time"=>{}, "v"=>0}

Configuration for the corresponding tag looks like this:

<match Tag1.**>
 @type prometheus
    <metric>
      name fluentd_output_status_num_records_total
      type counter
      desc The total number of outgoing records
      <labels>
        level ${level}
        error ${err}
        user ${u}
        client ${c}
        batchID ${bid}
        count ${cnt}
        sessions ${curr}
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
</match>

Also, I am not able to see the corresponding metrics in prometheus. It looks like this error is blocking it.

ArunDahiya1 avatar Jul 10 '18 13:07 ArunDahiya1

Also, my logs look like this: 2018-07-10T12:50:43Z Tag1 {"name":"Tag1","pid":11289,"level":50,"c":"client","err":"BrokerNotAvailableError: Broker not available","s":"Unsubscribe","tag":"kafka-failure","msg":"Broker not available","time":{},"v":0}

ArunDahiya1 avatar Jul 10 '18 13:07 ArunDahiya1

+1, same problem here using Fluentd (v1.3.2) and fluent-plugin-prometheus (v1.6.0) in Kubernetes (v1.14).

Fluentd configuration extract :

apiVersion: v2
kind: ConfigMap
metadata:
  name: fluentd-opa
  namespace: opa
data:
  fluent.conf: |

    # get logs from /var/log/containers/
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      format json
      read_from_head true
    </source>

    # instrument metrics from records
    # no impact against values of each records
    <filter kubernetes.var.log.containers.**opa**.log>
      @type prometheus
      <labels>
        status $.resp_status
      </labels>
      <metric>
        name opa_decisions_total
        type counter
        desc The total number of OPA decisions.
        # No key means increment counter for each record
      </metric>
    </filter>

    # output plugin for prometheus type
    <match kubernetes.var.log.containers.**opa**.log>
      @type copy
      <store>
        @type prometheus
        <metric>
          name opa_decisions_total
          type counter
          desc The total number of OPA decisions.
          # No key means increment counter for each record
        </metric>
      </store>
      <store>
        @type stdout
      </store>
    </match>

    # provides a metrics HTTP endpoint to be scraped by a Prometheus server
    # expose custom and default on container localhost
    <source>
      @type prometheus
      bind 0.0.0.0
      port 24224
      metrics_path /metrics
    </source>

Logs to parse :

{
	"log":"{
		"client_addr":"10.233.64.1:38700",
		"level":"info",
		"msg":"Sent response.",
		"req_id":12697,
		"req_method":"POST",
		"req_path":"/",
		"resp_body":"{
			"apiVersion":"admission.k8s.io/v1beta1",
			"kind":"AdmissionReview";
			"response":{
				"allowed":true
			}
		}",
		"resp_bytes":94,
		"resp_duration":3.421389,
		"resp_status":200,
		"time":"2019-09-20T13:40:11Z"
	}",
	"stream":"stderr"
}

Fluentd logs extract :

2019-10-01 08:32:36 +0000 [warn]: #0 prometheus: failed to instrument a metric. error_class=Prometheus::Client::LabelSetValidator::InvalidLabelSetError error=#<Prometheus::Client::LabelSetValidator::InvalidLabelSetError: labels must have the same signature (keys given: [] vs. keys expected: [:status]> tag="kubernetes.var.log.containers.opa-6849476f49-sc5qs_opa_opa-4d14b124df6764056629cd0b5e50ef7fea92fa9391ed031cb825549d3b74389f.log" name="opa_decisions_total"

However I am able to see the metric opa_decisions_total, but not sure of what it counts ... curl -ks http://<endpoint-IP>:<endpoint-port>/metrics gives

# TYPE opa_decisions_total counter
# HELP opa_decisions_total The total number of OPA decisions.
opa_decisions_total{status=""} 5462.0

Any idea ? Thanks

manicole avatar Oct 01 '19 08:10 manicole

worst workaround possible, but it works.

leverage tags.

<match nginx.access.base>
  @type route
  <route **>
    add_tag_prefix storage
    copy 
  </route>
  <route **>
    add_tag_prefix metrics
    copy
  </route>
</match>

first copy stream, so you can do terrible things. then scrub the value so that you don't have your separator '.' in the value

<filter metrics.nginx.access.base>
  @type record_transformer
  auto_typecast
  enable_ruby
  <record>
    domainclean ${ record["domain"].gsub('(http)(s?)(://)',"").gsub('.',"-") }
  </record>
</filter>

then append the new clean value to the tag

<match metrics.nginx.access.base>
  @type rewrite_tag_filter
  <rule>
    key domainclean
    pattern ^(.+)$
    tag https.${record['domainclean']}
  </rule>
</match>

then write to prometheus and storage in two separate blocks.

<match https.**>
 
    @type prometheus
    <metric>
      name fluentd_output_status_num_records_total
      type counter
      desc The total number of outgoing records
      <labels>
        tag ${tag}
        hostname ${hostname}  
        logname access_base
      </labels>
    </metric>
 
</match>

then the grafana templating funs start.

query for template variable label_values(fluentd_output_status_num_records_total{instance_name=~"app-prod-events.*",tag=~"http.*"},tag)

regex to filter tag /https\.([^.]+)\..*/ which returns first "value" after 'https.'

/https\.[^.]+\.([^.]+)\..*/ which returns second "value" after 'https.'

then turn on Value groups/tags (Experimental feature) and put in the tag query of label_values(tag)

also you can use label_replace...

sum by (domain_group) (
  label_replace(
    label_replace(rate(fluentd_output_status_num_records_total{instance_name=~"app-server-.*",tag=~"https.*"}[30s]), "domain_group", "$1", "tag", ".+"),
    "domain_group", "$1", "tag", `https\.([^\.]*)\..*`
  )
)

it's ugly but works.

austinhquinn avatar Sep 24 '21 17:09 austinhquinn