gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

How to make policy updates take effect immediately

Open ShawnLeung87 opened this issue 1 year ago • 3 comments

How to make the update of the policy take effect immediately, we found that the policy has to wait for expire_sec to expire and enter the next round, and then update it with update_gt_lua_state to take effect.

ShawnLeung87 avatar Jul 12 '22 08:07 ShawnLeung87

I used gk_flush_flow_table to refresh the flow table, but the premise is to know src_addr and dst_addr, is it possible to just enter dst_addr to force refresh the flow table of all dst_addr

ShawnLeung87 avatar Jul 12 '22 12:07 ShawnLeung87

At present, I tested to initiate a syn flood. When the attack reached 300m, there were a lot of "lpm: IPv4 lookup miss for x.x.x.x" miss source addresses.

ShawnLeung87 avatar Jul 12 '22 12:07 ShawnLeung87

dylib.update_gt_lua_states() and dylib.update_gt_lua_states_incrementally() immediately update the Lua policy on Grantor servers, but they have no effect on policy decisions that Gatekeeper servers had previously received. One has to wait for those policy decisions to expire, or to flush flow tables to force Gatekeeper servers to query Grantor servers again for those flows.

One can use dylib.c.gk_flush_flow_table() to flush a flow table, but one should avoid flushing flow tables often. I recommend only flushing in testing environments and when a policy change is critical to deal with an ongoing attack. This recommendation is because a flush forces Gatekeeper servers to read all the flow tables to find the flows to be dropped. All these reads are time-consuming and mess up the cache of the CPUs, so Gatekeeper may not be able to keep up with the traffic while it's flushing the flow table.

Notice that dylib.c.gk_flush_flow_table() is demanding even when the passed source and destination prefixes are very precise. The big problem is that flow tables are huge. A typical flow table takes more than 80% of the RAM of a server. Reading all that alone is time-consuming and messes up the cache of the CPUs. Less precise filters make it even worse by adding lots of writes on top of the already heavy reads.

You can pass a zero-length prefix to dylib.c.gk_flush_flow_table() (e.g. 0.0.0.0/0), so it filters on the destination prefix. The source and destinations prefixes passes to dylib.c.gk_flush_flow_table() work like a WHERE clause of a SQL query.

Under regular operation, it is better to let policy decisions expire.

Thank you for creating a new issue for the SYN flood question, namely issue #578. Having a single topic per issue makes it easier to reply and helps those searching for a similar problem.

AltraMayor avatar Jul 12 '22 19:07 AltraMayor