go-dnscollector
go-dnscollector copied to clipboard
DNSTAP compression - selectable per-stream
Is your feature request related to a problem? Please describe. Compression/decompression should be an option on any DNSTAP or TCP transmitted stream
Describe the solution you'd like Any data that transits via a DNSTAP or raw TCP session should have the option of being compressed, or uncompressed. This may "break" the other side if it is not a go-dnscollector consumer, but in many cases this is true, such as when DNSTAP messages are transmitted from many edge instances to a central DNSTAP collector (which is also running go-dnscollector.)
Describe alternatives you've considered Using forwarding sockets (localhost:1234) that push data through a compression tool that then forwards the data on could be used, but that's ugly and configuration is extremely difficult to keep track of. A VPN with compression could be used, but many VPNs do not support compression and therefore are not options (and most places will not replace their VPN for this one benefit.)
Additional context DNSTAP messages are highly compressible. They can be sent in reasonably large blocks, which enables significant compression for transmission over long-haul network links. While DNSTAP does not natively support compression, it seems not unreasonable that go-dnscollector could have a configurable compression flag that would mark a stream as being compressed with one of the different models of compression that are supported in other areas of the code currently. This would allow a much more efficient transmission of DNSTAP-based messages through various components.
Example config:
dnstap:
transport: tcp
compression: ztd
remote-address: 10.0.0.1
remote-port: 6000
It may be useful in the future to allow configuration of additional compression flag based on the model used, since some of the compression methods have better CPU performance or compression based on buffer sizes, etc. (see https://home.apache.org/~dongjin/post/apache-kafka-improved-compression/#:~:text=In%20general%2C%20the%20compressed%20ratio,compression%2Fdecompression%20speed%20and%20size.)
Hey John - Jim Mozley introduced us a few months ago in a conference call
We solve the issue you describe by pushing the data from go-dnscollector
into vector
- we're solving most of our telemetry issues with vector
now. Only small problem is vector
can't read the protobuf
from resurcor
(we still dont' know why), so we're using go-dnscollector
as the glue in-between - we just need it to populate one more field from recursor, so I'm working on that right now - but it won't compile !
Hi! Good to understand what you're using for ingestion, though it doesn't look like vector does much with the DNS data itself (the manual doesn't seem to reference any native unpacking of qtypes/rr's or other DNS object data.)
I might suggest that if Vector (a product by DataDog - https://vector.dev/docs/reference/configuration/sources/dnstap/) is ingesting your data, that you open a ticket with them referencing this ticket, to see if DataDog & Denis (or whoever writes the compression patch on go-dnscollector) can coordinate how to compress/uncompress data in DNSTAP, even if it is just static config-file declarations at each end and not auto-negotiated. If two implementations of DNSTAP consumer/producers can agree on how to compress/uncompress data, then we instantly have a standard. I'm sure compression would be welcome in any transmission model sending data from your DNSTAP origins to the backend at DataDog, especially given how very compressible this data is.
We run a vector
on the resolver box itself, which can ingest the dnstap
directly - converting it to a vector
event which is basically JSON, then we run vector
to vector
(over the long haul) which has gzip
as an option. We see about 14.2:1 compression.
https://vector.dev/docs/reference/configuration/sources/dnstap/ https://vector.dev/docs/reference/configuration/sinks/vector/#compression
Vector can also do things like random event selection built-in, which you mentioned you use.
One downside is that the JSON vector
produced from the dnstap
seems to be a proprietary format (there is an RFC DNS JSON format), but it does include a unique time-stamp field in the query that is repeated in the response, for matching up.
Compression implemented in next release v0.43.0!