go-dnscollector
go-dnscollector copied to clipboard
file logger: buffer still full despite "chan-buffer-size" changes
Many thanks for your time and effort with go-dnscollector!
We are using go-dnscollector as part of the pDNSSOC project (https://github.com/CERN-CERT/pDNSSOC)
We process incoming dnstap files for a number of resolvers with little/no control over the files size or content. We feed go-dnscollectors with dnstap files in a directory "watched" by the ingestor. Each file is about 50MB in size.
Unfortunately we suffer from a buffer full, dropping packets error. We typically loose hundreds of thousands of packets, and our logs are flooded with "buffer full" error messages.
We tried increasing the value of "chan-buffer-size" to a larger value (e.g. 10485760) but it does not change the amount of errors or the amount of lost packets in each error.
I am not sure this is a bug or just a configuration issue, or just us doing it wrong, or what we can do about this. Any advice welcome.
Current configuration:
- name: dns_server_1
file-ingestor:
enable: true
watch-dir: /var/pdnssoc_input/dns_server_1
watch-mode: dnstap
delete-after: true
chan-buffer-size: 10485760
transforms:
filtering:
log-queries: true
log-replies: true
drop-queryip-file: /etc/dnscollector/dns_servers.txt
Could you share more details ?
- the complete config file
- version used
- and output logs with buffer full messages
Sure! Thanks Denis.
We are using 0.41.0.
Config:
global:
trace:
verbose: true
text-format: "timestamp-rfc3339ns identity operation rcode queryip queryport family protocol length qname qtype latency"
# default text field delimiter
text-format-delimiter: " "
# default text field boundary
text-format-boundary: "\""
multiplexer:
collectors:
- name: dnstap-dns-server-1
file-ingestor:
enable: true
watch-dir: /var/pdnssoc_input/dns-server-1
watch-mode: dnstap
delete-after: true
chan-buffer-size: 10485760
transforms:
filtering:
log-queries: true
log-replies: true
drop-queryip-file: /root/dns_servers.txt
loggers:
- name: filelogdomains
logfile:
file-path: /var/dnscollector/matches/matches_domains.json
mode: json
transforms:
filtering:
keep-fqdn-file: '/var/dnscollector/misp_domains.txt'
- name: filelogips
logfile:
file-path: /var/dnscollector/matches/matches_ips.json
mode: json
transforms:
filtering:
keep-rdata-file: '/var/dnscollector/misp_ips.txt'
- name: fileall
logfile:
file-path: /var/dnscollector/queries.log
mode: json
max-size: 100
max-files: 100
postrotate-command: "/var/dnscollector/postrotate_query.sh"
postrotate-delete-success: true
routes:
- from: [ dnstap-dns-server-1 ]
to: [ filelogdomains, filelogips, fileall ]
Log files:
INFO: 2024/03/06 19:07:30.196568 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-18:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.196586 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-18:56:53.fstrm]
INFO: 2024/03/06 19:07:30.439693 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-10:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.439721 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-10:56:53.fstrm]
INFO: 2024/03/06 19:07:30.608676 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-02:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.608694 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-02:56:53.fstrm]
ERROR: 2024/03/06 19:07:33.763154 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 364989 packet(s) dropped
The error would repeat over and over:
ERROR: 2024/03/03 04:16:40.594323 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 647689 packet(s) dropped
ERROR: 2024/03/03 04:16:50.602279 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 741568 packet(s) dropped
ERROR: 2024/03/03 04:17:00.726706 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 665186 packet(s) dropped
ERROR: 2024/03/03 04:17:10.824914 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 318581 packet(s) dropped
ERROR: 2024/03/03 04:17:20.825447 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 123588 packet(s) dropped
ERROR: 2024/03/03 04:17:30.895391 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 84076 packet(s) dropped
ERROR: 2024/03/03 04:18:43.438253 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 232958 packet(s) dropped
ERROR: 2024/03/03 04:18:53.484891 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 646237 packet(s) dropped
ERROR: 2024/03/03 04:19:03.654005 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 484388 packet(s) dropped
ERROR: 2024/03/03 04:19:13.786948 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 403251 packet(s) dropped
(Just to give a volume indication, feeding on 34M dnstap file results in " dnstap - logger[fileall] buffer is full, 2003 packet(s) dropped" with the configuration file from the previous message)
In your case, the bottleneck is the fileall
logger, you can see that with logger[fileall] buffer is full, 647689 packet(s) dropped
.This message indicates that the buffer of the fileall
logger is full.
It occured because this logger is too slow regarding the amount of incoming data. To avoid to block the collector processing, the overhead of packets are dropped.
Could you try some adjustments
- If you remove the postrotate command, the drop of packet still occurs ?
- Try to increase the buffer of the logger
fileall
? Be carefull with this parameter (big impact on memory usage)
chan-buffer-size: 131070
Good thinking, thank you!
After testing, it turns out that:
- The postrotate script is causing the issue -- Disabling it solves the dropped packet issue.
- The buffer size of the logger
fileall
does not play a role. In fact, the default 65535 is sufficient.
For the record, the postrotate script we use is:
#!/bin/bash
BACKUP_FOLDER=/var/dnscollector/queries/$(date +%Y-%m-%d)
mkdir -p $BACKUP_FOLDER
FILE_NAME=$(basename $1 .log)
jq -c '. | {timestamp: .dnstap."timestamp-rfc3339ns", query: .dns.qname, client: .network."query-ip", server: .network."response-ip", client_id: .dnstap.identity , answers: .dns."resource-records".an }' $1 > $BACKUP_FOLDER/$FILE_NAME.json && gzip -S .gz_minified $BACKUP_FOLDER/$FILE_NAME.json
I will continue to investigate what is causing the bottleneck there and update this ticket with any possible solution.
@arvchristos @romainw The best way will be that the pdnssoc-cli uses natively the 'flat-json' model generated by DNScollector to avoid a new JSON generation from an external script. Also the file
logger support gzip
compression.
I think that the pDNSSOC and DNScollector integration can be greatly improved to support a higher load :)
Hey @dmachard !
The main reason why we relied on an external script for JSON generation is to be able to store only the required for correlation fields and save on size. I am not sure whether there is any transformer to remove fields from the JSON output but it seems useful for space optimization (combined with gzip
of course as you correctly pointed out).
We essentially need to store minified logs so that we can run correlation on past data with new indicators of compromise, apart from the live correlation we do with various other sources that go-dnscollector
supports.
I am not sure whether there is any transformer to remove fields from the JSON output
Great idea, can you open a ticket to track this?
I am not sure whether there is any transformer to remove fields from the JSON output
Great idea, can you open a ticket to track this?
Sure, here it is:
#642
Custom JSON relabeling has been implemented. It's will be implemented in next release v0.43.0. The documentation is here. Perhaps it can be used to improved performance.