packages icon indicating copy to clipboard operation
packages copied to clipboard

mwan3: unexpected behavior with 6in4 tunnel and netfilter marking

Open yayfortrees opened this issue 2 years ago • 2 comments

Maintainer: @feckert Environment: openwrt 21.02.3 mwan3 2.10.13-1

Description:

I am using mwan3 between a 6in4 tunnel (HE) with a static prefix and my ISPs dynamic ipv6 delegation. My hosts use addresses from the 6in4 tunnel prefix, but when it's available they go out the cable modem interface after being translated to the dynamic delegation via NETMAP. This allows me to accept incoming connections via static ipv6 addresses, but still use my ISPs native dynamic delegation for outgoing connections. This is actually working great, but I have noticed an odd problem.

Incoming connections from the 6in4 tunnel are not triggering the mwan3_iface_in rules:

Chain mwan3_iface_in_hurricane (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 MARK       all      6in4-hurricane *       ::/0                 ::/0                 match-set mwan3_connected src mark match 0x0/0x3f00 /* default */ MARK or 0x3f00
    0     0 MARK       all      6in4-hurricane *       ::/0                 ::/0                 mark match 0x0/0x3f00 /* hurricane */ MARK xset 0x400/0x3f00

I added some logging at the top of the mwan3_hook chain to take a look at the incoming packets from the 6in4 tunnel and noticed that the packets had already been marked with 0x3f00 which was causing them to bypass most of the mwan3 rules.

kern.debug kernel: [30491.841853] MWAN3(debug)IN=6in4-hurricane OUT= MAC= SRC=2600:xxxx:xxxx:xxxx:0000:0000:0000:0001 DST=2001:xxxx:xxxx:0000:0000:0000:0000:3156 LEN=104 TC=0 HOPLIMIT=55 FLOWLBL=130007 PROTO=ICMPv6 TYPE=128 CODE=0 ID=11559 SEQ=0 MARK=0x3f00

Where was this mark coming from? The logging was happening before any rules in the ipv6 mangle table marked packets. The only place I could think of was the ipv4 mangle table. Was it somehow carrying the mark over from when it came in the ipv4 interface. To test this I inserted a rule to the ipv4 mwan3_hook table excluding anything incoming from the 6in4 tunnel endpoint.

iptables -t mangle -I mwan3_hook -i bond0.5 -s 184.105.250.46 -j RETURN

Immediately incoming ipv6 connections start hitting the mwan3_iface_in rule:

Chain mwan3_iface_in_hurricane (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 MARK       all      6in4-hurricane *       ::/0                 ::/0                 match-set mwan3_connected src mark match 0x0/0x3f00 /* default */ MARK or 0x3f00
    9   936 MARK       all      6in4-hurricane *       ::/0                 ::/0                 mark match 0x0/0x3f00 /* hurricane */ MARK xset 0x400/0x3f00

and the mark is gone:

kern.debug kernel: [31785.307303] MWAN3(debug)IN=6in4-hurricane OUT= MAC= SRC=2600:xxxx:xxxx:xxxx:0000:0000:0000:0001 DST=2001:xxxx:xxxx:0000:0000:0000:0000:3156 LEN=104 TC=0 HOPLIMIT=55 FLOWLBL=130007 PROTO=ICMPv6 TYPE=128 CODE=0 ID=57133 SEQ=7

Is this expected behavior for netfilter? A mark made to a 6in4 packet in the ipv4 mangle table would automatically translate to the ipv6 mangle table? I'm pretty sure that's what I'm seeing. If it is expected behavior what can be done to deal with this in mwan3?

My setup is still working because the packets just go to the main table and get routed out properly from that. I could see this behavior causing problems though.

yayfortrees avatar May 09 '22 17:05 yayfortrees

I have noticed similar issues IPv6 traffic as well under mwan3 2.10 versions, the GitHub issue I created a while ago is quite long but based on the iptables logging, it was determined that there was some weird packet marking going on:

https://github.com/openwrt/packages/issues/14332

I currently use NAT6 though, I have considered NETMAP but there's no documentation I can find that provides how it can be done or configured in OpenWrt.

Would you be able to provide some rough pointers on how you use NETMAP, as I'd love to try it.

My solution at the moment has been to downgrade back down to mwan3 2.8.16 as the IPv6 functionality doesn't have this problem.

jamesmacwhite avatar May 29 '22 08:05 jamesmacwhite

@feckert I think this is an interesting area which I've hit before as well, although @yayfortrees has provided information that highlights it better than my issue in #14332 and might be something to look into regarding the netfilter/marking behaviour in mwan3.

jamesmacwhite avatar Sep 14 '22 18:09 jamesmacwhite

I've encountered the same issue. The mark of underlying ipv4 tunnel connection is assigned to all incoming ipv6 packets. I realized that it totally breaks mwan3 ipv6 routing.

It affects not only incoming connections, it also affects all outgoing connections too:

  1. First outgoing packet gets proper mark (for example, MARK=0x600).
  2. Reply packet is already marked 0x3f00 before reaching mwan3_hook, as a result connmark is updated to 0x3f00
  3. All subsequent outgoing packets gets MARK=0x3f00 (from connmark)

For anyone interested, could you please try this patch: https://github.com/anyuta1166/packages/commit/09dcf607d1c273c2b9402dbb2e20a000273c3516

I attempted to fix this by clearing the mark in incoming packets on wan interfaces only (to make sure that we do not break tracking and "mwan3 use" command).

Basically it adds a few ip6tables rules:

ip6tables -t mangle -N mwan3_ifaces_pre
ip6tables -t mangle -N mwan3_iface_pre_wan6
ip6tables -t mangle -A mwan3_ifaces_pre -m mark ! --mark 0x0/0x3f00 -j mwan3_iface_pre_wan6
ip6tables -t mangle -A mwan3_iface_pre_wan6 -i 6in4-wan6 -m mark ! --mark 0x0/0x3f00 -j MARK --set-xmark 0x0/0x3f00
ip6tables -t mangle -I mwan3_hook -m mark ! --mark 0x0/0x3f00 -j mwan3_ifaces_pre

I didn't try to use it with NAT66, but without NAT it works fine.

anyuta1166 avatar Apr 29 '23 11:04 anyuta1166

Thanks for your patch, I tried this on mwan3 2.10 (I noted that this was originally tested with 2.11?), I have a PPPoE WAN and 6in4, I was able to apply this patch without issues:

root@linksys-wrt3200acm:/lib/mwan3# patch -u -b mwan3.sh -i 09dcf607d1c273c2b9402dbb2e20a000273c3516.patch
patching file mwan3.sh
Hunk #1 succeeded at 266 with fuzz 2 (offset 2 lines).
Hunk #2 succeeded at 276 (offset -19 lines).
Hunk #3 succeeded at 360 with fuzz 2 (offset -28 lines).
Hunk #4 succeeded at 390 (offset -30 lines).
Hunk #5 succeeded at 405 (offset -30 lines).

However I don't think it's fixed the issue, I am using NAT66 though.

The symptom I have with IPv6 is my PPPoE WAN6 is working and ipv6.google.com works, but connections to Hurricane Electric IPv6 is failing. If I use test-ipv6.com, that uses various Hurricane Electric IPv6 addresses for tests, I get the following failures on the IPv6 test:

Test with IPv6 DNS record
timeout (15.402s)
[https://ipv6.vm3.test-ipv6.com/ip/?callback=?](https://ipv6.vm3.test-ipv6.com/ip/?callback=?&testdomain=test-ipv6.com&testname=test_aaaa)

Test IPv6 large packet
timeout (15.360s)
[https://mtu1280.vm3.test-ipv6.com/ip/?callback=?&size=1600&fill=xxx...xxx](https://mtu1280.vm3.test-ipv6.com/ip/?callback=?&size=1600&fill=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&testdomain=test-ipv6.com&testname=test_v6mtu)

Find IPv6 Service Provider
timeout (15.438s)
[https://ipv6.lookup.test-ipv6.com/ip/?callback=?&asn=1](https://ipv6.lookup.test-ipv6.com/ip/?callback=?&asn=1&testdomain=test-ipv6.com&testname=test_asn6)

These all report timeouts, I think because Hurricane Electric 6in4 traffic is broken in some way, but other IPv6 addresses I can reach fine.

Example:

nslookup mtu1280.vm3.test-ipv6.com
Server:         172.20.192.1
Address:        172.20.192.1#53

Non-authoritative answer:
Name:   mtu1280.vm3.test-ipv6.com
Address: 2001:470:1:18::3:1280

The test domain is using Hurricane Electric IPv6 space, which I believe is more to do with the fact that because of the fwmark issues with 6in4, the traffic is not being routed properly.

jamesmacwhite avatar Apr 29 '23 12:04 jamesmacwhite

@jamesmacwhite could you please post your mwan3 config? I'll try to reproduce the issue.

anyuta1166 avatar Apr 29 '23 13:04 anyuta1166

Sure. Attached. For some context:

I have two main WAN interfaces wan and wanb, I use a failover configuration, not balanced.

The Wireguard interfaces are for Mullvad VPN, but not actual physical WANs.

mwan3.txt

mwan3 status shows they are all up

Interface status:
 interface wan is online 00h:23m:49s, uptime 211h:15m:37s and tracking is active
 interface wan6 is online 00h:23m:48s, uptime 211h:15m:33s and tracking is active
 interface wanb is online 00h:23m:49s, uptime 211h:15m:34s and tracking is active
 interface wanb6 is online 00h:23m:49s, uptime 211h:15m:37s and tracking is active
 interface wg is online 00h:23m:49s, uptime 211h:15m:37s and tracking is active
 interface wg6 is online 00h:23m:49s, uptime 211h:15m:37s and tracking is active
 interface wgb is online 00h:23m:49s, uptime 211h:15m:37s and tracking is active
 interface wgb6 is online 00h:23m:49s, uptime 211h:15m:37s and tracking is active
 interface wgc is online 00h:23m:49s, uptime 211h:15m:37s and tracking is active
 interface wgc6 is online 00h:23m:49s, uptime 211h:15m:37s and tracking is active

So it's not an issue with an interface being "down"

jamesmacwhite avatar Apr 29 '23 13:04 jamesmacwhite

I've setup NAT66 and made some test with native ipv6 + 6in4.

Without my patch, I have 50% ping loss as stated in #14332 (this happens when mwan3 default rule is assigned to 6in4 and default interface in main routing table is native ipv6). This happens because first packet get correct mark and goes via 6in4, but second packet gets mark 0x3f00 and goes via default routing table from wrong interface and so on.

With my patch everything works fine!

Tested on OpenWRT 22.03-snapshot with mwan3 2.11.4 and this patch https://github.com/anyuta1166/packages/commit/09dcf607d1c273c2b9402dbb2e20a000273c3516

anyuta1166 avatar Apr 29 '23 19:04 anyuta1166

Thanks for testing. I guess the main difference between my setup is OpenWrt 21.02.6 and mwan3 2.10 rather than 22.03 and mwan3 2.11.

Unfortunately, my router Linksys WRT3200ACM has various issues with 22.03, so I have avoided upgrading, given builds were stopped after 23.03.2, given broken switch behaviour.

Without the patch IPv6 on the primary WAN does work, but Hurricane Electric specific IPv6 addresses do not get routed, I have flagged with my ISP in case there was a specific issue with the actual peering between, but I figured it's probably more likely something related to mwan3.

Running the patch doesn't seem to have changed anything directly in my case, but equally it hasn't broken anything either.

The problem is likely more visible because 6in4 is not the default IPv6 route.

jamesmacwhite avatar Apr 29 '23 19:04 jamesmacwhite

As a workaround if I implement this rule in mwan3:

config rule 'henet'
        option 'dest_ip' '2001:470::/32'
        option family 'ipv6'
        option use_policy 'henet_only'

This allows HE traffic to work, because I'm now forcing any Hurricane Electric IPv6 traffic through 6in4 itself. Confirmed by checking traceroute. So unless my PPPoE WAN does have specific issues with Hurricane Electric IPv6, this might be a workaround for now.

test-ipv6.com works, but my IPv6 address is detected as my 6in4 WAN interface because the IPv6 tests are going through HE.net.

This does have the disadvantage that I have removed failover for any Hurricane Electric IPv6 traffic, but given it's not my primary IPv6 anyway, I guess it could work. Long term, we shouldn't be relying on 6in4 anyway, but IPv6 adoption slow etc etc.

jamesmacwhite avatar Apr 30 '23 04:04 jamesmacwhite