go-dnscollector icon indicating copy to clipboard operation
go-dnscollector copied to clipboard

DNSTAP compression - selectable per-stream

Open johnhtodd opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. Compression/decompression should be an option on any DNSTAP or TCP transmitted stream

Describe the solution you'd like Any data that transits via a DNSTAP or raw TCP session should have the option of being compressed, or uncompressed. This may "break" the other side if it is not a go-dnscollector consumer, but in many cases this is true, such as when DNSTAP messages are transmitted from many edge instances to a central DNSTAP collector (which is also running go-dnscollector.)

Describe alternatives you've considered Using forwarding sockets (localhost:1234) that push data through a compression tool that then forwards the data on could be used, but that's ugly and configuration is extremely difficult to keep track of. A VPN with compression could be used, but many VPNs do not support compression and therefore are not options (and most places will not replace their VPN for this one benefit.)

Additional context DNSTAP messages are highly compressible. They can be sent in reasonably large blocks, which enables significant compression for transmission over long-haul network links. While DNSTAP does not natively support compression, it seems not unreasonable that go-dnscollector could have a configurable compression flag that would mark a stream as being compressed with one of the different models of compression that are supported in other areas of the code currently. This would allow a much more efficient transmission of DNSTAP-based messages through various components.

Example config:

dnstap:
  transport: tcp
  compression: ztd
  remote-address: 10.0.0.1
  remote-port: 6000

It may be useful in the future to allow configuration of additional compression flag based on the model used, since some of the compression methods have better CPU performance or compression based on buffer sizes, etc. (see https://home.apache.org/~dongjin/post/apache-kafka-improved-compression/#:~:text=In%20general%2C%20the%20compressed%20ratio,compression%2Fdecompression%20speed%20and%20size.)

johnhtodd avatar Nov 30 '23 20:11 johnhtodd

Hey John - Jim Mozley introduced us a few months ago in a conference call

We solve the issue you describe by pushing the data from go-dnscollector into vector - we're solving most of our telemetry issues with vector now. Only small problem is vector can't read the protobuf from resurcor (we still dont' know why), so we're using go-dnscollector as the glue in-between - we just need it to populate one more field from recursor, so I'm working on that right now - but it won't compile !

james-stevens avatar Dec 05 '23 17:12 james-stevens

Hi! Good to understand what you're using for ingestion, though it doesn't look like vector does much with the DNS data itself (the manual doesn't seem to reference any native unpacking of qtypes/rr's or other DNS object data.)

I might suggest that if Vector (a product by DataDog - https://vector.dev/docs/reference/configuration/sources/dnstap/) is ingesting your data, that you open a ticket with them referencing this ticket, to see if DataDog & Denis (or whoever writes the compression patch on go-dnscollector) can coordinate how to compress/uncompress data in DNSTAP, even if it is just static config-file declarations at each end and not auto-negotiated. If two implementations of DNSTAP consumer/producers can agree on how to compress/uncompress data, then we instantly have a standard. I'm sure compression would be welcome in any transmission model sending data from your DNSTAP origins to the backend at DataDog, especially given how very compressible this data is.

johnhtodd avatar Dec 05 '23 18:12 johnhtodd

We run a vector on the resolver box itself, which can ingest the dnstap directly - converting it to a vector event which is basically JSON, then we run vector to vector (over the long haul) which has gzip as an option. We see about 14.2:1 compression.

https://vector.dev/docs/reference/configuration/sources/dnstap/ https://vector.dev/docs/reference/configuration/sinks/vector/#compression

Vector can also do things like random event selection built-in, which you mentioned you use.

One downside is that the JSON vector produced from the dnstap seems to be a proprietary format (there is an RFC DNS JSON format), but it does include a unique time-stamp field in the query that is repeated in the response, for matching up.

james-stevens avatar Dec 05 '23 18:12 james-stevens

Compression implemented in next release v0.43.0!

dmachard avatar Mar 23 '24 19:03 dmachard