nfdump icon indicating copy to clipboard operation
nfdump copied to clipboard

Total bytes and packets discrepancy between two instances

Open CharlesMAtkinson opened this issue 3 years ago • 2 comments

As part of an upgrade, we have installed nfdump on a second server and configured our edge router to send netflows to it in the same way as the original. The two servers show very different total bytes and packets. Tested by downloading a 378 MB file:

c@CW10:/tmp$ date; wget https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-11.3.0-amd64-netinst.iso; date
Fri 10 Jun 12:26:22 IST 2022
...
378.00M
...
Fri 10 Jun 12:31:15 IST 2022
c@CW10:~$ ip a
... 10.10.50.118 ...

The result on the original server

[email protected]:~$ nfdump -M /opt/nfsen/profiles-data/live/edge1 -R 2022/06/10/nfcapd.202206101225:2022/06/10/nfcapd.202206101230 'host 10.10.50.118' -o "fmt:%ts %pkt %byt %fl" | grep Summary
Summary: total flows: 1301, total bytes: 444.9 M, total packets: 426682, avg bps: 6.0 M, avg pps: 715, avg bpp: 1042

The result on the new server, not credible

[email protected]:~$ nfdump -M /var/cache/nfdump/edge1 -R 2022/06/10/nfcapd.202206101225:2022/06/10/nfcapd.202206101230 'host 10.10.50.118' -o "fmt:%ts %pkt %byt %fl" | grep Summary
Summary: total flows: 668, total bytes: 9.3 M, total packets: 126014, avg bps: 123724, avg pps: 209, avg bpp: 73

The original server is Debian Buster with the nfdump suite built from https://github.com/phaag/nfdump/releases/tag/v1.6.20. Its nfcapd command is /usr/local/bin/nfcapd -w -D -p 2056 -u nfsen -g www-data -B 200000 -S 1 -P /opt/nfsen/var/run/p2056.pid -z -I edge1 -l /opt/nfsen/profiles-data/live/edge1

The new server is Debian Bullseye with the nfdump suite from Debian package nfdump which is v1.6.22. Its nfcapd command is /usr/bin/nfcapd -D -P /tmp/nfcapd.edge1.pid -e -g nfcapd -l /var/cache/nfdump/edge1 -p 2055 -S1 -u nfcapd

CharlesMAtkinson avatar Jun 13 '22 07:06 CharlesMAtkinson

From distance, that's difficult to tell.I am almost sure, it must be something on your setup than a bug in nfcapd. For example you automatically expire flows on the second box. Do both collectors get the same data stream? Try to forward one data stream to the 2nd collector (-R). Use the same options everywhere. If the data rate is within reasonable size, it must not make any difference.

phaag avatar Jun 19 '22 09:06 phaag

Thanks for the debugging suggestions, Peter

Try to forward one data stream to the 2nd collector (-R)

Tried that but tcpdump showed the repeat packets differed from the incoming packets. 192.168.3.1 is the sender, 192.168.8.31 is the first collector and 192.168.8.41 is the second collector

[email protected]:~# tcpdump --interface any -n port 2055 or port 2056
...
10:59:05.011714 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.011725 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.011752 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.011755 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.011763 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.011766 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.011843 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404
10:59:05.071732 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.071746 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.071776 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.071780 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.071940 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404
10:59:05.071953 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404
10:59:05.072027 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404
10:59:05.131769 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.131784 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.131816 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.131819 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.131827 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.131830 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.131921 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404
10:59:05.191866 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.191885 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.191925 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.191928 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.191935 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 220
10:59:05.191938 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 220
10:59:05.191945 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.251631 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.251645 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.251678 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.251681 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.251689 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.251692 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.251781 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404
10:59:05.311673 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.311684 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.311708 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.311711 IP 192.168.3.1.2056 > 192.168.8.31.2056: UDP, length 1404
10:59:05.311786 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404
10:59:05.311794 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404
10:59:05.311839 IP 192.168.8.31.60181 > 192.168.8.41.2055: UDP, length 1404

Use the same options everywhere

Tried that with the exception of -u and -g. Tested with -R and with the sender sending independently to both collectors. The discrepancy was similar to what I reported in the first comment above

In case the network was dropping packets, we looked at the netflow packets per second 1) on the sender, 2) on the first collector and 3) on the second collector. All four values were similar. FWIW here's the script used

#!/bin/bash
hhmmss_current=
IFS=.
while read hhmmss trash
do
    if [[ $hhmmss != $hhmmss_current ]]; then
        echo $hhmmss_current: $n packets
        hhmmss_current=$hhmmss
        n=0
    fi
    ((n++))
done < <(tcpdump --interface "$INTERFACE" -n port "$PORT")

How can we investigate further?

CharlesMAtkinson avatar Jul 01 '22 05:07 CharlesMAtkinson

Not sure if, you made progress on this issue. As far as it concerns nfcapd and the repeater, it sends out the received packet as soon as it received one. In you tcpdump output, it looks like only a few packets are on the wire finally. That's strange. For me it looks like, either the kernel is busy and drops packets for some reason. The buff -B is applied likewise to the receiving as well as sending side. Something seems to be overloaded on your system. As it looks like a kernel or network related thing, I close this issue of nfdump. Do not hesitate to reopen it, if you thing it's an nfdump issue.

phaag avatar Dec 18 '22 14:12 phaag