go-dnscollector
go-dnscollector copied to clipboard
Enhancements to reducer uniqueness matching and additional json field
Is your feature request related to a problem? Please describe. The reducer is very useful, but does not have enough granularity. It is also difficult to understand what the reducer does.
I'd like to see the traffic reducer have more flexibility, which would open huge new abilities to create summarized traffic data and would reduce output significantly. This is important not just for telemetry bandwidth, but is also vitally important for storage of telemetry data. Aggregation done at the edges of the network is far more efficient than done at the core, and it is often impossible to do at the core because collecting the data is not feasible due to bandwidth or processing power.
My suggestion would be that the uniqueness matching be made to permit understanding all the parts of a dnstap packet.
Why is this useful? We want to understand as much as possible, with as little data transmitted and archived as possible. While we can do statistical sampling to get a window into our query sets, that often ends up with lots of garbage queries swamping the actual legitimate data. We want a good understanding of the real query set, and to do this, we would be happy with a counter on certain criteria instead of the full resolution data. By using counters, we can make huge volumes of garbage data turn into a single reported packet every X seconds, and real data would have the same standing as garbage data. Our intent would be to do both statistical sampling, but ALSO to have counter-based summaries. Currently, the model for considering an event as "repeated" is not granular or flexible enough, so we would like to modify the method that can determine uniqueness.
Describe the solution you'd like Configurable matching criteria for uniqueness in order for the reducer to coalesce multiple events into a single event.
Instead of picking hard-coded fields as is the case currently, it should be the case that any data in the packet can be used to determine uniqueness matching. This includes labels that may have been added by go-dnstapcollector in other transforms or even on another server. Wildcards are supported to reflect an entire sub-set of results (such as "resource-records".) If a wildcard of "*" appears, then all sub-fields must be matched in order to create an increment to the occurences counter.
Examples: This would emulate the existing matching model:
transforms:
reducer:
repetitive-traffic-detector: true
qname-plus-one: false
watch-interval: 5
unique-fields: dnstap.identity, dnstap.operation, dns.qname, dns.query-ip, dns.qtype
As another example, this would be the reduce-match-on: syntax if we just want to ignore the query-ip, so we would get counters just on dns.qname which would really reduce our reporting output:
unique-fields: dnstap.identity, dnstap.operation, dns.qname
We could even remove qname from this process to create ad-hoc counters on other elements. What if we wanted to count how many times we get queries to a specific IP address, across all protocols:
unique-fields: dnstap.identity, dnstap.operation, network.response-ip, network.protocol
How about summarizing the query counts from each country based on the geoip transform? Theoretically possible:
unique-fields: dnstap.identity, dnstap.operation, geoip.country-isocode
Only report on results that have exactly the same set of resource records in the reply:
unique-fields: dnstap.identity, dnstap.operation, dns.qname, dns.resource-records.*
Summarize the number of times any particular NS record is seen (note: includes where ns is empty, but those would be summarized into a single summary packet so that's not a big deal.)
unique-fields: dnstap.identity, dnstap.operation, dns.resource-records.ns
I admit that some of these are unusual or seem to be "edge-case" examples. I would argue that as a service or network grows in size, the primary task becomes handling edge-cases. :-)
There are a few "weird" fields that probably need special handling:
Note: the "ttl" field is ignored in resource-records and cannot be used as a uniqueless identifier. Note: timestamps in summary records will reflect the first entry that was unique was seen, and timestamps cannot be used as uniqueness identifiers. Note: dnstap.latency is kept as an average of all samples and reported, and cannot be used as a uniqueness identifier.
Note: the "reducer:" label also needs an additional field in addition to "occurences" and "cumulative-length" and that is: "watch-interval" since that is required to understand the expansion of the summary set across time.
Question: I'm unclear on how data is expressed in a summary packet that represents many packets. I would expect everything to be empty other than the matching data, because it is not possible (or even desirable) to express any other parts of the prior packets, since it is not possible to (for example) insert the average of "dns.flags.ad=1" and "dns.flags.ad=0" if they are different between two aggregated answers. Why is there still data in the structure for those fields when aggregated answers are transmitted? Any fields not matched as part of the uniqueness set should be emptied. Or am I misunderstanding the summary result?
Question: I am not sure where the counter for watch-interval starts for the reducer. Does it start counting at the first time it sees an item (unique or not) or does it start counting at some absolute moment in time? In other words: does every item in the memory queue have a different counter that triggers a transmission when it reaches zero, or is there just one counter and all summary packets are sent all at once? I hope it's the first description, because if there is only a single counter there is a problem: if we have a watch-interval of 30 seconds and the cardinality is very high - let's say 100,000 objects - then they all try to be sent at once, which ends up with a DoS against our central dnstap collector.