igmpproxy
igmpproxy copied to clipboard
igmpproxy sometimes removes the mroute if "Membership Report" message is received close to "Max Resp Time" value specified in the "Membership Query" message
I have noticed that occasionally igmpproxy
removes the multicast route if "Membership Report" message is received close to "Max Resp Time" value specified in the "Membership Query" message. For example, please see the packet capture below:
root@r1:~# tshark -Y 'frame.number in {25..27}' -r br0_igmp.pcap
Running as user "root" and group "root". This could be dangerous.
25 190.655057 192.168.0.1 → 224.0.0.1 IGMPv2 46 Membership Query, general
26 196.659055 192.168.0.1 → 224.0.0.1 IGMPv2 46 Membership Query, general
27 200.665434 192.168.0.12 → 239.3.1.106 IGMPv2 60 Membership Report group 239.3.1.106
root@r1:~#
This packet capture was made on a LAN-facing interface named br0
in a router running igmpproxy
. As seen above, the "Membership Report" message was received 10.010377 seconds after the initial "Membership Query" message while the "Max Resp Time" value in the "Membership Query" messages was 10.0 seconds. This caused igmpproxy
to remove the multicast route:
Sent membership query from 192.168.0.1 ifIndex 11 to 224.0.0.1. Delay: 10
Created timeout 11 (#0) - delay 10 secs
(Id:11, Time:10)
Created timeout 12 (#1) - delay 115 secs
(Id:11, Time:10)
(Id:12, Time:115)
About to call timeout 11 (#0)
Aging routes in table.
Removing group 239.3.1.106. Died of old age.
Removed route entry for 239.3.1.106 from table.
Vif bits : 0x00000001
Setting TTL for Vif 0 to 1
Removing MFC: 10.0.2.28 -> 239.3.1.106, InpVIf: 1
One might think that this is perhaps an expected behavior as "Membership Report" message was received later than the "Max Resp Time" of 10.0 seconds, but according to rfc2236 Section 3 the host receiving the "General Membership Query" is allowed to answer at 10.0th second and when adding network/processing delays, then it is IMHO expected that occasionally the "Membership Report" messages will arrive tiny bit later than 10.0 seconds.
In order to troubleshoot this better, I built a small lab with following topology:
h3[ens3] <-> [ens3]gw1[ens4] <-> [ens3]mr1
h3
is a virtual machine in multicast receiver role running socat -u UDP4-RECV:10,ip-add-membership=239.3.5.3:ens3 -
. gw1
is a multicast router running igmpproxy
version 0.4 with configuration below:
quickleave
phyint ens4 upstream ratelimit 0 threshold 1
altnet 10.0.0.0/8
phyint ens3 downstream ratelimit 0 threshold 1
mr1
is in the multicast sender role and running send(IP(src='10.0.2.28', dst='239.3.5.3')/UDP(sport=10, dport=10), iface='ens3', inter=1./2, loop=1)
in scapy which sends UDP datagrams addressed to 239.3.5.3
twice a second via ens3
interface in an endless loop. All three virtual machines are running Debian 10.
I have observed the similar issue on this lab setup. One occurrence can be seen on the screenshot below:
Upper tmux
panes are h3
running tcpdump
and socat
, respectively. Middle panes are src/igmpproxy -d -vv /etc/igmpproxy.conf 2>&1
piped through timestamp input utility and custom packet capture utility. Lower right pane is running the scapy
command mentioned above in mr1
and lower left pane is printing a timestamp in gw1
if the multicast route is missing:
while true; do ip mroute | grep -q '(10.0.2.28,239.3.5.3) *Iif: ens4 *Oifs: ens3 State: resolved' || date +'%T.%3N'; done
As highlighted on the screenshot, the multicast route was removed 0.9 seconds before the "Membership Report" message was received. The "Membership Report" message was received 10.153 seconds after the initial "Memberhip Query". However, I have seen this happen even when the delay between the query and the report has been 9.6 seconds. In addition, the multicast route is not removed each time when the delay is >10.0 seconds.
In order to avoid the removal of the multicast route, I relaxed the pselect()
timeout:
root@gw1:~/igmpproxy# diff -u src/igmpproxy.c{~,}
--- src/igmpproxy.c~ 2023-05-05 01:09:08.551415847 +0300
+++ src/igmpproxy.c 2023-05-05 01:09:27.418989595 +0300
@@ -296,7 +296,7 @@
int MaxFD, Rt, secs;
fd_set ReadFDS;
socklen_t dummy = 0;
- struct timespec curtime, lasttime, difftime, tv;
+ struct timespec curtime, lasttime, difftime, tv, pselect_timeout;
// The timeout is a pointer in order to set it to NULL if nessecary.
struct timespec *timeout = &tv;
@@ -339,8 +339,12 @@
FD_ZERO( &ReadFDS );
FD_SET( MRouterFD, &ReadFDS );
+ memcpy(&pselect_timeout, timeout, sizeof(struct timespec));
+ pselect_timeout.tv_sec *= 2;
+ my_log(LOG_DEBUG, 0, "Calling pselect() with %ld.%09ld seconds timeout", pselect_timeout.tv_sec, pselect_timeout.tv_nsec);
+
// wait for input
- Rt = pselect( MaxFD +1, &ReadFDS, NULL, NULL, timeout, NULL );
+ Rt = pselect( MaxFD +1, &ReadFDS, NULL, NULL, &pselect_timeout, NULL );
// log and ignore failures
if( Rt < 0 ) {
root@gw1:~/igmpproxy#
Alternatively, at least in a lab environment, one could send frequent "Membership Report" messages from the multicast receiver host, e.g send(IP(dst="239.3.5.3")/IGMP(type=0x16, gaddr="239.3.5.3"), iface="ens3", inter=1./1, loop=1)
in scapy.
Has anyone else observed this behavior? Please let me know if I can provide additional details.