IPSEC traffic gets denied by "Default deny" when fragmentation comes in
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
- [X] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue
Describe the bug
We have 3 OPNSense (one 22.1.9_1 and two 22.1.8_1) being connected by IPSEC Tunnel in routed mode. The topology looks like this: Site A <-IPSEC-> SiteB <-IPSEC-> Site C
It seems the firewall does not associate states correctly in cases where the traffic is fragmented IP as soon as TCP is inside. Result is that legit traffic is blocked by "Default deny/state violation" rule.
Traffic from A to C is routed through B. This overall works fine, but when it comes to certain traffic things seem to break. On Site C we have a VOIP PBX and a phone on Site A registers with SIP over TCP. So far so good.
Now when the phone wants to place a call and sends a rather big INVITE message that needs to be fragmented. The fragments reach the OPNSense on Site B, but never leave in the direction of Site A.
As far as I analyzed the situation it seems the packet filter on site B for some reason does not see the traffic as part of a established connection and thus denies it. Although I have some experience in the field the (Free)BSD world is kinda new to me. From what I read it could be related to scrubbing in some way, but playing around with settings in this section (turn on/off, create some rules, ...) did not help.
Strange enough to me: Not all traffic is affected. I can use UDP although it seems to get lost sometimes it is not always. Pretty sure because of the connection less nature of UDP.
As there are so much parts involved I am still not 100% sure I got all things sorted and deliver all relevant information to help. I did my best to isolate the problematic situation. Here are some details on what works and how the problem looks in form of traffic.
First thing that works. Ping from Site A to Site C with need of fragmentation
$ ping -c1 -s1450 -n -M dont 10.63.64.17
PING 10.63.64.17 (10.63.64.17) 1450(1478) bytes of data.
1458 bytes from 10.63.64.17: icmp_seq=1 ttl=61 time=8.20 ms
--- 10.63.64.17 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.208/8.208/8.208/0.000 ms
Here are the Packets on OPNSense in Site B.
First on the enc0:
11:38:44.361352 (authentic,confidential): SPI 0xc8abef03: IP 10.63.36.3 > 10.63.64.17: ICMP echo request, id 1039, seq 1, length 1376
11:38:44.361383 (authentic,confidential): SPI 0xc8abef03: IP 10.63.36.3 > 10.63.64.17: ip-proto-1
11:38:44.361483 (authentic,confidential): SPI 0xc560d490: IP 10.63.36.3 > 10.63.64.17: ICMP echo request, id 1039, seq 1, length 1376
11:38:44.361512 (authentic,confidential): SPI 0xc560d490: IP 10.63.36.3 > 10.63.64.17: ip-proto-1
11:38:44.368238 (authentic,confidential): SPI 0xcd721cda: IP 10.63.64.17 > 10.63.36.3: ICMP echo reply, id 1039, seq 1, length 1376
11:38:44.368257 (authentic,confidential): SPI 0xcd721cda: IP 10.63.64.17 > 10.63.36.3: ip-proto-1
11:38:44.368279 (authentic,confidential): SPI 0xc136669b: IP 10.63.64.17 > 10.63.36.3: ICMP echo reply, id 1039, seq 1, length 1376
11:38:44.368302 (authentic,confidential): SPI 0xc136669b: IP 10.63.64.17 > 10.63.36.3: ip-proto-1
Then on ipsec3 (interface for tunnel to Site A):
11:38:44.361371 IP 10.63.36.3 > 10.63.64.17: ICMP echo request, id 1039, seq 1, length 1376
11:38:44.361387 IP 10.63.36.3 > 10.63.64.17: ip-proto-1
11:38:44.368275 IP 10.63.64.17 > 10.63.36.3: ICMP echo reply, id 1039, seq 1, length 1376
11:38:44.368299 IP 10.63.64.17 > 10.63.36.3: ip-proto-1
The rest of the route is obviously ok, so all fine and all good. Now for an example where things do not work.
Traffic coming in on enc0 Site B.
11:59:45.125228 (authentic,confidential): SPI 0xc8abef03: IP 10.63.37.32.50087 > 10.63.64.17.5060: Flags [P.], seq 3177873646:3177874990, ack 3880894718, win 1996, options [nop,nop,TS val 368764 ecr 2529783899], length 1344
11:59:45.125253 (authentic,confidential): SPI 0xc8abef03: IP 10.63.37.32 > 10.63.64.17: ip-proto-6
11:59:45.327985 (authentic,confidential): SPI 0xc8abef03: IP 10.63.37.32.50087 > 10.63.64.17.5060: Flags [P.], seq 0:1344, ack 1, win 1996, options [nop,nop,TS val 368785 ecr 2529783899], length 1344
11:59:45.328007 (authentic,confidential): SPI 0xc8abef03: IP 10.63.37.32 > 10.63.64.17: ip-proto-6
11:59:45.747957 (authentic,confidential): SPI 0xc8abef03: IP 10.63.37.32.50087 > 10.63.64.17.5060: Flags [P.], seq 0:1344, ack 1, win 1996, options [nop,nop,TS val 368827 ecr 2529783899], length 1344
11:59:45.747972 (authentic,confidential): SPI 0xc8abef03: IP 10.63.37.32 > 10.63.64.17: ip-proto-6
11:59:46.587998 (authentic,confidential): SPI 0xc8abef03: IP 10.63.37.32.50087 > 10.63.64.17.5060: Flags [P.], seq 0:1344, ack 1, win 1996, options [nop,nop,TS val 368911 ecr 2529783899], length 1344
11:59:46.588027 (authentic,confidential): SPI 0xc8abef03: IP 10.63.37.32 > 10.63.64.17: ip-proto-6
This is never seen on the ipsec3 (Tunnel to Site A). What I find in the logs is this:
2022-07-01T12:00:39 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33476,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T12:00:39 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33476,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894730,1996,,nop;nop;TS
2022-07-01T12:00:12 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33474,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T12:00:12 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33474,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894726,1996,,nop;nop;TS
2022-07-01T11:59:59 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33472,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T11:59:59 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33472,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894722,1996,,nop;nop;TS
2022-07-01T11:59:52 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33471,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T11:59:52 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33471,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894722,1996,,nop;nop;TS
2022-07-01T11:59:49 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33470,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T11:59:49 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33470,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894722,1996,,nop;nop;TS
2022-07-01T11:59:47 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33468,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T11:59:47 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33468,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894718,1996,,nop;nop;TS
2022-07-01T11:59:46 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33467,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T11:59:46 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33467,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894718,1996,,nop;nop;TS
2022-07-01T11:59:46 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33466,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T11:59:46 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33466,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894718,1996,,nop;nop;TS
2022-07-01T11:59:46 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33465,1376,none,6,tcp,71,10.63.37.32,10.63.64.17,
2022-07-01T11:59:46 Informational filterlog 6,,,02f4bab031b57d1e30553ce08e0ec131,enc0,match,block,in,4,0xa0,,63,33465,0,+,6,tcp,1396,10.63.37.32,10.63.64.17,50087,5060,1344,PA,3177873646:3177874990,3880894718,1996,,nop;nop;TS
Also on the Site B OPNSense:
# pfctl -s states | grep 10.63.37.32
all tcp 10.63.37.32:50087 -> 10.63.64.17:5060 ESTABLISHED:ESTABLISHED
all tcp 10.63.64.17:5060 <- 10.63.37.32:50087 ESTABLISHED:ESTABLISHED
As states are floating and from what I understand this should make traffic go through, but here again from the GUI:

Last but not least here are what I think are the relevant rules:
# pfctl -s rules | grep enc0
pass out log on enc0 all flags S/SA keep state label "3ecbdd82fb09322f1d198dcfe3ffc566"
pass in quick on enc0 inet from <Remote_Net> to (lagg1:network) flags S/SA keep state label "d7932be2114f65b5c1e7559a28bdf7f4"
pass in log quick on enc0 inet from <Remote_Net> to <Remote_Net> flags S/SA keep state label "fe43f42923dd80afe67596c2af88498b"
The setup is pretty simple and was more simplified to get this sorted out. Therefor Remote_Net includes all Networks reachable though tunnels.
Since the traffic does not reach the ipsecX interface I do not put in more rules and dumps here, but could of course make some. I realy hope it is not something dump I do not realize in the setup at all, but for now I am lost.
One workaround is to disable "keep state" on the enc0 interface. Then it breaks reaching "Site C" like this:

Notice that here all fragemnts appear. In the other UI logview this was not seen, but guess thats just UI dependend and has nothing to do with functionality.
Here are the Packets: enc0
13:19:22.042898 (authentic,confidential): SPI 0xcf284c68: IP 10.63.37.32.41905 > 10.63.64.17.5060: Flags [P.], seq 4888:6232, ack 2555, win 3746, options [nop,nop,TS val 846351 ecr 2534562426], length 1344
13:19:22.042978 (authentic,confidential): SPI 0xcf284c68: IP 10.63.37.32 > 10.63.64.17: ip-proto-6
ipsec1 is not reached and again disabling "keep state" makes all work.