packages icon indicating copy to clipboard operation
packages copied to clipboard

mwan3: incompatible with xfrm interfaces (used for ipsec)

Open patrakov opened this issue 2 years ago • 1 comments

Maintainer: @aaronjg @feckert Environment: OpenWRT 22.03.1 on x86_64 (for debugging - later will be transferred to a different device)

Description:

I tried to solve the following problem. There are two ISPs, wana (primary) and wanb (backup). Both provide non-public IP addresses. There is also a VPN provider which gives out a public IP address, but they only support IKEv2. I want to connect to their VPN using strongswan and route traffic through wana/wanb/vpn as appropriate, based on policies that I should create using mwan3. Traffic that goes from the local networks should be NATed, as usual - this also applies to traffic that goes through the VPN.

The requirement to NAT the traffic that goes through the VPN means that I should use an XFRM interface, otherwise (i.e. with the default strongswan setup) we are going to need an SNAT to an IP address that is not known in advance and is not a primary IP of any interface, which would be too complicated in OpenWRT. Additionally, XFRM interfaces have much better handling of MTU issues, and can be added to mwan3 so that traffic is directed there as needed via mwan3 configuration - just like any other interface.

However, even without trying to implement redundancy between wana and wanb, I found that, merely by having wana in the configuration file of mwan3, the VPN becomes broken. Namely, incoming packets (e.g. pings) that arrive via the VPN are replied via some wan directly, which obviously can't work.

I found that a similar issue (#9905) with VTI interfaces has already been reported but closed due to the non-response of the submitter - thus the intentional duplicate.

Here is what happens.

The VPN is established as usual through wana, and this kind of packet exchange creates a conntrack entry. As the policy is to send this through wana, mwan3 assigns the connmark of 0x100/0x3f00 to this connection, and restores it as a firewall mark on every packet that belongs to this connection - and then makes decisions based on that.

Let's consider that somebody tries to ping the public IP that the VPN provider has assigned to our router.

The ICMP echo packet goes from the internet to the VPN provider, whose VPN server encrypts and encapsulates it as appropriate. The encapsulated packet is sent to our router via UDP from port 4500 to port 4500.

Our router receives the UDP packet. It goes through the mangle table, arriving via the PREROUTING chain. Initially, the firewall mark of the packet is zero. Immediately, it gets restored from the connmark to 0x100/0x3f00, because it does belong to the IPSEC connection initially established over wana. For some unknown reason, it is later overwritten with 0x3f00/0x3f00, but it is irrelevant.

Then the kernel decrypts the packet, and pretends that the decrypted packet arrives via the xfrm30 interface (which corresponds to the VPN). The packet goes through all the rules of iptables again.

So, mwan3 sees it through the mangle table and PREROUTING chain. However, from the very start, it has the fwmark of 0x3f00/0x3f00, which prevents mwan3 from understanding that it is a separate connection. This incorrect fwmark is then saved to connmark - although the correct connmark would have been 0x300/0x3f00.

At this point, the kernel knows about two connections: UDP 4500, and ICMP, with separate connmarks that are set, respectively, to 0x3f00/0x3f00 (correctly) and 0x3f00/0x3f00 (incorrectly, should be 0x300/0x3f00).

The kernel receives a ping and sends an ICMP echo reply as a response.

mwan3 sees this echo reply as a packet via the mangle table and OUTPUT chain. From the start, it has 0x0/0x3f00 fwmark, but as the first step, it gets restored to 0x3f00/0x3f00, because that's the (wrong) value stored in the connmark of the ICMP connection. Therefore the echo reply is incorrectly routed via the default routing table (should have been routed to xfrm30).

In other words, the root of the problem is that, due to the fwmark being present from the very beginning, mwan3 did not recognize that the ping came via the VPN.

Based on my reading of the iptables manual page, the PREROUTING table of the mangle table is used for altering incoming packets. There is no things like SO_MARK that could set the initial fwmark of a packet arriving through a normal (non-IPSEC) interface to anything except zero. And mwan3 simply assumes that it is initially zero. Well, the assumption is wrong for XFRM interfaces.

I was able to fix my problem by inserting this rule (of course the same applies to IPv6):

iptables -t mangle -I PREROUTING 1 -m comment --comment "Do not inherit the mark of encrypted packets" -j MARK --set-xmark 0x0/0x3f00

As explained above, it is safe to just reset the mark to zero on all incoming packets, so that it is restored from the connmark immediately after, because it is a no-op for non-IPSEC packets.

However, this is possibly not the full fix, as the same thing can also, in theory, happen on OUTPUT. I have tried to reproduce this via a simple rule that sends all traffic to 9.9.9.9 via the VPN - but it works, even for traffic originating from the router. I don't know why.

Note that in the original issue (#9905) there are words that it doesn't work with wireguard interfaces either, for a similar reason. I haven't tried, and I don't know whether my concern about the OUTPUT would materialize there.

Still, if you believe that fixing only the PREROUTING chain (by unconditionally initializing the packet mark to zero) is not a sufficient solution, please consider that we can't do the same initialization on output, because this would break the "mwan3 use" case, which works by setting the initial mark on the socket.

I guess, a more general solution would be to reserve one bit of the mark, ignore it in the routing decisions, but always set it at the end of the mwan3_hook chain. At the very beginning of the mwan3_hook chain, test it. If it is non-zero, then the packet has been looped (e.g. through encryption), and so should be treated as a fresh one, by reinitializing the whole mark to zero before restoring it from the connmark.

Note to the readers: I still haven't found how to fail-over the IPSEC connection itself between wana and wanb, but it's a separate issue. This one is about compatibility only - i.e. making the VPN work at all if mwan3 is present, with the replies to packets that arrived via the VPN going out through the VPN also.

patrakov avatar Oct 15 '22 03:10 patrakov

@feckert If you want, I can provide VM images with everything set up - just tell me if you need this.

patrakov avatar Oct 19 '22 15:10 patrakov

Hello Alexander,

sorry for the late reply!

I don't know if I can help you there. Unfortunately I don't have a setup with mwan3 and IPsec using it as you describe. I know there are problems because of the special ipsec vpn handling in the kernel, but I only use mwan3 on physical interfaces. Mwan3 only tracks the physical interfaces and IPsec is established over one of those interfaces. So I have a very simple setup.

At the moment I don't have much time to dive deeper into this. I would help you if you have any questions about the code If you have a pullrequest to fix this, I would look at it. Also, it wouldn't be bad if you could send me the uci configurations for this setup to test the suggested changes in a staging/testarea.

Best regards

Florian

Am Mi., 19. Okt. 2022 um 17:31 Uhr schrieb Alexander E. Patrakov < @.***>:

@feckert https://github.com/feckert If you want, I can provide VM images with everything set up - just tell me if you need this.

— Reply to this email directly, view it on GitHub https://github.com/openwrt/packages/issues/19607#issuecomment-1284202326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEHBAZHYVQGDEYUJPW6SVLWEAH5VANCNFSM6AAAAAARFXGZ5Q . You are receiving this because you were mentioned.Message ID: @.***>

feckert avatar Oct 22 '22 10:10 feckert

I've encountered exactly the same issue with 6in4 ipv6 tunnels. It seems that resetting the mark to zero on incoming packets should be enough to fix the issue.

anyuta1166 avatar May 02 '23 22:05 anyuta1166