pro-bing icon indicating copy to clipboard operation
pro-bing copied to clipboard

Unexpected packet loss with v0.5.0

Open dswarbrick opened this issue 1 year ago • 2 comments

pro-bing v0.5.0 seems to have a severe regression relating to packet loss. Pinging a host on a home WiFi network with v0.4.1 succeeded as expected:

~/src/pro-bing ((HEAD detached at v0.4.1))$ go run cmd/ping/ping.go -t 10s 192.168.1.1
PING 192.168.1.1 (192.168.1.1):
32 bytes from 192.168.1.1: icmp_seq=0 time=4.835066ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=1 time=5.048004ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=2 time=5.108371ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=3 time=5.04682ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=4 time=5.15136ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=5 time=4.051555ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=6 time=5.45183ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=7 time=4.222143ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=8 time=4.014752ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=9 time=4.182697ms ttl=64

--- 192.168.1.1 ping statistics ---
10 packets transmitted, 10 packets received, 0 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 4.014752ms/4.711261ms/5.45183ms/507.947µs

However, with v0.5.0, the packet loss is extreme:

~/src/pro-bing ((HEAD detached at v0.5.0))$ go run cmd/ping/ping.go -t 10s 192.168.1.1

PING 192.168.1.1 (192.168.1.1):
32 bytes from 192.168.1.1: icmp_seq=1 time=4.087365ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=4 time=3.835809ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=7 time=4.251692ms ttl=64

--- 192.168.1.1 ping statistics ---
10 packets transmitted, 3 packets received, 0 duplicates, 70% packet loss
round-trip min/avg/max/stddev = 3.835809ms/4.058288ms/4.251692ms/171.024µs

The results are quite reproducible by switching back and forth between the versions and repeating the test. In a subsequent test with v0.5.0, 100% packet loss was observed.

Given how dramatic this change is, I think that v0.5.0 should be retracted.

dswarbrick avatar Nov 30 '24 15:11 dswarbrick

Running a git bisect seems to point the finger at #120. Testing the commit immediately before:

~/src/pro-bing ((HEAD detached at 622e8b3))$ go run cmd/ping/ping.go -t 10s 192.168.1.1
PING 192.168.1.1 (192.168.1.1):
32 bytes from 192.168.1.1: icmp_seq=0 time=3.936235ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=1 time=3.593734ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=2 time=3.851939ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=3 time=3.591965ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=4 time=4.033911ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=5 time=3.589511ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=6 time=4.725205ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=7 time=3.951019ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=8 time=3.601376ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=9 time=24.897539ms ttl=64

--- 192.168.1.1 ping statistics ---
10 packets transmitted, 10 packets received, 0 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 3.589511ms/5.977243ms/24.897539ms/6.315272ms

And then testing the commit which landed that PR:

~/src/pro-bing ((HEAD detached at 543f9b2))$ go run cmd/ping/ping.go -t 10s 192.168.1.1
PING 192.168.1.1 (192.168.1.1):
32 bytes from 192.168.1.1: icmp_seq=0 time=3.443049ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=2 time=3.799941ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=4 time=3.688231ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=6 time=4.165249ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=8 time=4.639888ms ttl=64

--- 192.168.1.1 ping statistics ---
10 packets transmitted, 5 packets received, 0 duplicates, 50% packet loss
round-trip min/avg/max/stddev = 3.443049ms/3.947271ms/4.639888ms/417.2µs

dswarbrick avatar Nov 30 '24 16:11 dswarbrick

With the assumption that this bizarre packet loss is related to setting the traffic class on the outgoing ICMP echo request, I found that adjusting the traffic class with the -Q option in the included cmd/ping tool resulted in loss-free tests:

$ go run cmd/ping/ping.go -Q 128 -t 10s 192.168.1.1

PING 192.168.1.1 (192.168.1.1):
32 bytes from 192.168.1.1: icmp_seq=0 time=2.089062ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=1 time=4.554064ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=2 time=4.303305ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=3 time=4.663038ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=4 time=5.689999ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=5 time=4.327294ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=6 time=9.331246ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=7 time=9.820285ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=8 time=10.847874ms ttl=64
32 bytes from 192.168.1.1: icmp_seq=9 time=4.414933ms ttl=64

--- 192.168.1.1 ping statistics ---
10 packets transmitted, 10 packets received, 0 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 2.089062ms/6.004108ms/10.847874ms/2.768617ms

Setting the option with -Q 0 also yields a good test, and effectively mimics the behaviour of v0.4.1, which did not implement traffic class at all. However, it seems that the new pro-bing default traffic class 192 (DSCP CS6) is extremely flaky.

I am able to reproduce this behaviour with the standard Linux iputils ping tool (in either SOCK_RAW or SOCK_DGRAM mode) with -Q 192 on this target host, and others on the same network. Given that the classic ping tool does not set traffic class by default, I wonder whether it is wise for pro-bing to assume that all devices will play nicely with DSCP CS6.

dswarbrick avatar Nov 30 '24 16:11 dswarbrick