firewall4 icon indicating copy to clipboard operation
firewall4 copied to clipboard

Reorder early state dispatch for quicker outcome

Open brada4 opened this issue 1 year ago • 9 comments

locate offload at the end of slowpath ... use builtin tcpudp filter in place of extra filter ... and directly yield to offload-add kworker

drop invalid asap and avoid further activity on useless packets ... which accidentally simplifies main state dispatch ... so make use of optimized output chain dispatch alternatives depending on global setting

Thanks-to: @CallMeR for tcpudp filter avoidance idea Thanks-to: forum user kvic for detailed review and suggestions Discussed: https://github.com/openwrt/firewall4/pull/20 Part-reverts: https://github.com/openwrt/firewall4/commit/19a8caf614ec338513e58535ea02c6ee52988170

Signed-Off-By: Andris PE [email protected]

brada4 avatar Feb 28 '24 15:02 brada4

@jow- diff is identical to #20 , share if any (non-revolutionary) changes can improve it. Diff visualisation misses logic change: old: filter.forward if offload add flow dispatch states new: filter.forward if offload dispatch states diverting to offload chain else dispatch states

brada4 avatar Feb 28 '24 15:02 brada4

Dropping invalid packets over localhost would be swapping iif lo and ct state in output along removing iif != in new prerouting. I dont feel either way, so I maintained behaviour exactly.

brada4 avatar Feb 29 '24 14:02 brada4

@jow- this alters semantics for improved safety discarding invalid (out of state and bad checksum) packets before nat alg helpers.

brada4 avatar Jun 03 '24 15:06 brada4

@jow- made it vmap, netfilters own examples now has vmaps everywhere....

  • drop invalid early
  • change comments (not meant to obfuscate change)
  • use whole output lines in place of string concatenation, just to avoid superlong lines in code
  • add offload hash on first qualifying packet, not second, so that second packet is already offloaded. new connection -> first ack after synack sets hash (used to only set "assured" state and hash set at next packet) timer retirement -> hash is restored after mandatory slowpath (it is just state dispatch, not the full filter) for certain, not tried before mandatory slowpath traversal

brada4 avatar Jun 06 '24 09:06 brada4

@jow- hi, got nice pro feedback at https://forum.openwrt.org/t/first-rule-in-chain-input-output-for-firewall4/204723 and implemented best parts, 1 cosmetic 2 improves NAT performance by dozen hairs

brada4 avatar Jul 24 '24 06:07 brada4

Also discovered that this adds easy flowtable exception via /e/n.d/ for more fifo-ish behaviour (still to dig up test case)

brada4 avatar Jul 25 '24 09:07 brada4

Should this PR drop commit https://github.com/openwrt/firewall4/pull/22/commits/a625924e002c50206509e85f32084707c18f22cb since it is partially reverted in https://github.com/openwrt/firewall4/pull/22/commits/5dc4d82932ae0c7a9416f0969dc695b60250be2c ?

jow- avatar Jul 25 '24 12:07 jow-

No, it should sray like this short simple. 1k evaluations on a pc totals to about same 7.abit ms for either but vmap version has broader deviation not explainable by any significant cpu consumption or absent in case network load.

brada4 avatar Jul 25 '24 15:07 brada4

Yes, default configuration is revert (2 rules swapped tough)

brada4 avatar Jul 30 '24 08:07 brada4

Ill split this in 2 pieces - 1/2 handling invalid packets early 2/2 jump-branching offload

brada4 avatar Mar 28 '25 09:03 brada4

Hi @brada4 this patch is still valid with latest update right?

pesa1234 avatar Mar 31 '25 08:03 pesa1234

It is still valid. if i split it 2nd half has to be heavily re-based other patches reduce non-mandatory packet examination in kind of default established,related accept adding some throughput to underpowered soc cpus

brada4 avatar Mar 31 '25 08:03 brada4

That pull request might help Mediatek CPU device owners when HW offload is enabled. @jow- , what @brada4 needs to do to make that PR to be merged?

danpawlik avatar Apr 17 '25 09:04 danpawlik

I am splitting this in 3 pieces, later today.

brada4 avatar Apr 17 '25 09:04 brada4

@brada4 thank you. my understanding from your posts is that this PR #22 will be replace by 3 separate PR.

  1. PR #56
  2. PR #59
  3. yet to come

correct?

glassd00r avatar Apr 28 '25 05:04 glassd00r

Based on my own usage and observation, I have identified some potential negative impacts (PR https://github.com/openwrt/firewall4/pull/59) :

  • Risk of misjudging legitimate traffic

Scenario: Certain special protocols (such as non-standard NAT traversal, legacy applications) or abnormal network environments (such as incorrect configuration of NAT devices) may cause normal traffic to be mislabeled as "invalid".

Impact: If legitimate traffic (such as UDP connections that are not correctly tracked, packets with failed fragmentation reassembly) is dropped, it may lead to connection interruptions or abnormal service operations.

  • Issues in handling fragmented packets

Mechanism: Conntrack relies on the first fragment to establish the connection state. If the first fragment is lost, subsequent fragments will be regarded as "invalid" and dropped.

Impact: For applications that rely on fragmented packet transmission (such as large file transfers, certain VPN protocols), it may cause performance problems or data loss. However, this is a normal protective behavior (subsequent packets without the initial fragment are inherently abnormal traffic).

CallMeR avatar Apr 28 '25 05:04 CallMeR

@CallMeR If you see the diagram conntrack state is classified at -200, last chance to make it valid (or notrack for more obvious usage) was respective raw table, e.g setting back same TTL to fix checksum.

brada4 avatar Apr 28 '25 08:04 brada4

@glassd00r yes, thats correct. 3rd patch should be simple

+if offload devs > 0
+ ct state established related goto handle_offload
+else
 ct state established related accept
+endif
... later in file
+ if offload devs >0 
+ chain handle offload
+  add flow ... accept
+ accept

brada4 avatar Apr 28 '25 08:04 brada4

@brada4 sorry if this has been explained before. why break up pr #22 into 3 separate PR? pros and cons?

glassd00r avatar Apr 28 '25 08:04 glassd00r

@glassd00r Why one patch? It changes same lines over and over again Bad: it looks messy in totality Why layer up 3 patches Good: simple few-line patches are easier to understand.

brada4 avatar Apr 28 '25 08:04 brada4

@brada4 cool. will test and feedback after you push part3 of the PR.

same custom filter chains rules?

chain` handle_offload {
  # ct bytes < 1000000 counter accept
  # ct packets < 30 counter accept
  # meta l4proto udp counter accept
  # meta length < 100 counter accept
  # ct direction original counter accept
  # the fw4 offload table follows
  # flow add @ft accept
  # accept
}

glassd00r avatar Apr 28 '25 08:04 glassd00r

@glassd00r i remember writing those in forum, yep ill add example in /etc/nftables.d ;-)

brada4 avatar Apr 28 '25 08:04 brada4

Last 3rd here: https://github.com/brada4/firewall4/tree/guard-offload ruleset.uc includes 2nd 3rd too.

brada4 avatar Apr 29 '25 20:04 brada4

I am just trying to concentrate scattered discussions in one place. https://forum.openwrt.org/t/lets-talk-about-firewall4-nftables/231246

brada4 avatar Apr 30 '25 07:04 brada4

I am closing this Somebody remember to submit 3rd part in two years https://github.com/openwrt/firewall4/pull/22#issuecomment-2840172900

brada4 avatar May 19 '25 07:05 brada4