FreeBSD-src icon indicating copy to clipboard operation
FreeBSD-src copied to clipboard

Extra functionality added when killing states

Open brownowski opened this issue 7 years ago • 8 comments

to be used in Bug #8555

Adds two new options into pfctl: -M which will attempt to find and kill a matching state in the opposite direction if it exits -k gateway -k which will find states set to route through a specific gateway and kill these states.

Also, gateway will be shown in state information when printing states with extra verbosity.

When used in conjunction with each other it should allow you to kill states routing out a specific gateway when it goes offline along with all matching NAT states. Only works for policy based routing rules, not where default gateway is set. Can also be used with a interface flush states to kill LAN states when a WAN goes offline.

See Bug #8555 for corresponding patches which trigger a selective state kill on gateway failover.

Code has not been used outside of limited test environment. Posting for review.

brownowski avatar Oct 22 '18 02:10 brownowski

I don't like the idea of keeping more patches on src when it could be submitted to upstream but I believe the best person to decide it is @loos-br

rbgarga avatar Oct 23 '18 10:10 rbgarga

We used to have a switch like this to kill states for a gateway (-G) several years ago but there was some reason we removed it. It may have been to get closer to FreeBSD but I also seem to recall it not working as expected in various ways. I can't find the specific commit that removed the old FreeBSD pfctl patch, however, even in the old tools repo, only when we removed it from the pfSense code base (https://github.com/pfsense/pfsense/commit/ff3da5dba67c64514808e86165e92362f3ff8b33)

jim-p avatar Oct 23 '18 13:10 jim-p

Maybe this one is relevant? https://redmine.pfsense.org/issues/3181 (https://github.com/pfsense/pfsense/commit/36fa13a632bad73fb2a8fdc2e9627e3190ea63c6)

The main problem I seemed to find when looking through some of these older tickets (there are quite a few related to this issue) was that only half the connection states are killed off. Flushing the whole state table was the solution that was selected in most cases. As it stands, policy based routing breaks UDP and ICMP during a failed primary gateway event unless all states are killed. Without killing states TCP can usually recover by timing out the connection and reconnecting from a different source port, however UDP and ICMP usually do not. When using the option to kill states, this includes killing states on all active WANs, and also any internal VLAN to VLAN traffic as well. If you happen to be in the poor situation that one of your WANs (such as your emergency third tier backup 4G connection) goes up and down sometimes, you end up killing the entire network states for a WAN failure that potentially wasn't carrying any relevant states anyway. Killing all states just wasn't an option in our scenario.

The aim of this patch is to make it so states can be killed selectively by gateway or interface but also can kill both inbound and outbound states for the connections selected.

The killing by gateway does only work for rules that have a specific gateway set, so mainly policy based routing rules. I think in rules/states using the default gateway the route-to address is blank/set to 0. With this patch, flushing states by interface will also work along with switch to kill matching inbound/outbound states to achieve a similar function.

At the very least I believe the ability to kill both inbound and outbound states when selecting states to kill makes sense and I don't really know why there would be no option to do this.

The other alternative method to patching the kernel that I looked into was to only kill UDP and ICMP states instead of flushing the entire state table, given that TCP states will generally timeout and create new connections but UDP/ICMP states will generally continue to use the same src/dst ports keeping old states alive and never fail over to the new gateway. At the time I was looking into this, killing via udp or icmp protocol was broken in the code (https://redmine.pfsense.org/issues/8636) but this should have been fixed now and might be a practical alternative to these patches to the kernel. You do need to stop the UDP ports from randomizing during NAT for this method to work best.

brownowski avatar Oct 24 '18 03:10 brownowski

I've been pointed at this patch, and mostly like it. I've pulled it apart into three changes:

  • print gateway
  • kill by gateway
  • kill matching states

The first two are part of the review series I've posted, starting here: https://reviews.freebsd.org/D30051 and all the way to D30059

(Please do let me know how you wish to be credited.)

The last one is problematic. It's managed to encounter the most complex part of pf's locking protocol. Each hash row has its own lock. The two states that define a connection may be in different or in the same hash row. As a result we need to be extremely careful about the order of locking. This is documented in pf_state_key_attach(). The current patch will usually work, but is vulnerable to deadlocks if two kill operations are run at the same time, and in the (unlikely) case of two states being in the same hash row it's going to double lock and deadlock against itself. It's distinctly non-trivial to make the locking work correctly for this part of the patch. My current thinking is to put that part aside until a hypothetical locking rework at some point in the future, but I reserve the right to change my mind about that.

kprovost avatar Apr 30 '21 17:04 kprovost

Thanks for taking a look at the patch. I don't really mind how my code is credited. Whatever would be the standard way is fine. I'm mainly happy to see that there is some interest in bringing some of the functions into the upstream code. When I posted this it was mainly as a proof of concept that I was hoping someone else more experienced in kernel coding might be able to pick up and work from.

The patch submitted here has never been used outside of a limited test environment. I remember setting up the build environment was such a pain. I never even managed to get a proper full build image that would install. I could only manage to grab the binary files for the kernel and updated pfctl utilities and manually copy them over the top of an already installed pfsense build to test them.

I do hope that the killing of matching states on inbound/outbound interfaces will eventually find its way through as well, as it was the main driver behind the patch. Initially we implemented a userspace workaround to the issue which runs in php, which we still actually use on some of our production systems where we require this functionality, despite being much clunkier and having several severe limitations. The kernel patch was created in the hope to have something better which would scale properly that could be included upstream so we didn't have to continue to maintain our own patches to test and apply during upgrades.

brownowski avatar May 01 '21 02:05 brownowski

The usual approach for crediting is to either add a 'Submitted by: ...' tag, or - new in the git world - set the committer field to your name and e-mail address. Given that I've done some work to change your patch (mostly to ensure we don't break the ABI) I'd go with a 'Submitted by:' tag, but I'd like to know the name (and optionally e-mail adres) you'd like me to use.

kprovost avatar May 01 '21 07:05 kprovost

No problem. You can use 'Steven Brown' as the name.

brownowski avatar May 01 '21 09:05 brownowski

It turned out that we could actually avoid the locking issue by being slightly careful about when we search for and delete the matching state: https://reviews.freebsd.org/D30092 (and test https://reviews.freebsd.org/D30093).

kprovost avatar May 03 '21 14:05 kprovost