go-dnscollector icon indicating copy to clipboard operation
go-dnscollector copied to clipboard

Multiple prometheus instances

Open johnhtodd opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. Being able to define multiple Prometheus consumers would be very useful. This would allow statistical collection on any set of filtered criteria. Right now, it is impossible to get statistics on specific criteria - all Prometheus data is lumped together. Being able to separate out different subsets of data might actually eliminate the need to centralize DNSTAP data and log it into a database for future queries entirely.

Describe the solution you'd like It would be ideal if many prometheus logging consumers could be specified at the end of a chain of pipelines or filters.

Here's a fully-formed potential example file. It's essentially the same syntax as today, with the only addition of a new method of parsing the "prometheus-labels:" primitive to include arbitrary labels in the stored data for that particular instantiation of the prometheus consumer.

  - name: dnsdist-receiver-from-prod-machines
    dnstap:
      listen-ip: 0.0.0.0
      listen-port: 8173
    routes: [ apple-only, all-data ]

pipelines:
  - name: prom-all-queries
    prometheus:
      listen-ip: 0.0.0.0
      listen-port: 8081
      basic-auth-enable: false
      prometheus-prefix: "dnscollector"
      top-n: 10
      histogram-metrics-enabled: true
      prometheus-labels: ["stream_id" "job=all-queries" ]

  - name: prom-apple
    prometheus:
      listen-ip: 0.0.0.0
      listen-port: 8082
      basic-auth-enable: false
      prometheus-prefix: "dnscollector"
      top-n: 10
      histogram-metrics-enabled: true
      prometheus-labels: ["stream_id" "job=apple.com"]

  - name: apple-only
    dnsmessage:
      matching:
        include:
          dnstap.operation: "CLIENT_RESPONSE"
          dns.qname: "^*.\\apple\\.com$"
      policy: "drop-unmatched"
    routes: [ prom-apple ]

  - name: all-data
    dnsmessage:
      matching:
        include:
          dnstap.operation: "CLIENT_RESPONSE"
      policy: "drop-unmatched"
    routes: [ outputfile, prom-all-queries ]

  - name: outputfile
    logfile:
      file-path:  "/var/log/dnstap.log"
      max-size: 1000
      max-files: 10
      mode: json

Describe alternatives you've considered DNSTAP data is collected for insights. If it is possible to create insights via a TSDB (Prometheus) scrape of go-dnscollector with specific pre-understood filtered insights, then it may minimize the need for much of the DNSTAP data to be forwarded to a central collector, or for database queries to somehow be applied to the historical data set. This may not be the majority of cases, but there are certainly subsets of data that have high interest to DNS system administrators which could be aggregated very efficiently by go-dnscollector if they could be segmented out into Prometheus metrics at the edge of the collection pipeline instead of at the very end. The alternatives to this method are the traditional database queries that are done against the final set of DNSTAP data that resides in a data pool somewhere, which are slow and batch-based versus real-time ingested into a TSDB.

Additional context Each Prometheus instance would need to somehow distinguish itself from the others. The very good news is that Prometheus does this already today with the concept of labels. Or an alternate port (or alternate URL endpoint) could be used, but that may have configuration complexity.

I see two possible methods to segment the multiple Prometheus data sets from each other, which do not necessarily conflict:

  • enforce configuration parsing rules that each Prometheus instance has its own port number, or endpoint (/metrics, /metrics/apple, /metrics-apple or similar configurable names.) Each instance would build a set of Prometheus metrics based on the data it receives, plus the "go_" and "process_" outputs from the main code routines.

  • enforce configuration parsing rules that demand that each Prometheus declaration stanza has at least one label that is different than any other instantiation ( "prometheus-labels: ["stream_id" "job=all-queries" ]" and "prometheus-labels: ["stream_id" "job=apple.com" ]" as shown above would be sufficient even if we omitted port numbers, as an example) but allow multiple instances to be interleaved in the same query to "/metrics" - they would be distinct due to their different labels. If this method is used, then the "go_" and "process_" results need only be shown once.

I have no particular bias towards one of these methods, and can see uses for both. Perhaps they are both used.

johnhtodd avatar Dec 14 '23 21:12 johnhtodd

This is even more useful with the "TopN" model, just by itself. Given the flexibility of the pipeline and filter concepts, having separate prometheus instances would solve a significant number of issues. It would be possible with the "TopN" concept to divide interesting traffic up into fairly large buckets ("Top 100 AAAA records that are SERVFAIL", "Top 3000 NOERROR responses with DNSSEC AD bit set") etc. without everything being combined into a single job or metrics stepping on each other if there were multiple prometheus pipelines possible. (sorry for enthusiasm here, but the more I think about this the more it is useful for snapshot understanding of system behaviors via Prometheus, in conjunction with the DNSTAP reporting back to a core analysis platform.)

johnhtodd avatar Dec 20 '23 20:12 johnhtodd