gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

A new flow table

Open AltraMayor opened this issue 4 years ago • 2 comments

The current flow tables use DPDK hash library. While this library provided Gatekeeper with a good starting point, it has some edges that conflict with the way GK blocks use it. The two salient points are the following:

  1. DPDK hash tables have a key store that maintains a copy of all the keys present in the tables. These keys are needed to ensure that a lookup doesn't confuse two entries with identical hash values. The key store also eliminates the need for entries having a copy of their keys. Nevertheless, when entries must have a copy of their keys, like flow entries, the key store brings two disadvantages: 1.1. Duplicate, wasted memory. A flow table in Gatekeeper is 36 bytes long, and a small flow table has 2^20 entries, that is, the key store takes 36MB of memory. 1.2. One of the lasts steps of an entry lookup is to load the key being searched. When the number of entries with the same hash value is small and the flow table is large, it is better to load the entry instead of the key to minimize cache misses in the running core. In Gatekeeper, less than 1% of the buckets have entries with the same hash value.
  2. While DPDK hash tables have a fast lookup, they struggle to add new entries when the table has high occupancy. This is critical for GK block during an attack that explores the keyspace. A solution to this problem is not known at this point, but the future solution must balance the faster lookup speed with the occupancy of the table. Section "22.7. Entry distribution in hash table" of DPDK documentation points out that their implementation can reach up to 95% occupancy.

AltraMayor avatar Nov 25 '19 13:11 AltraMayor

One direction is to explore the trade-offs in DPDK's hash library among table occupancy, the maximum number of move tries when inserting an item, and throughput.

An alternative solution is to use Hopscotch hashing, as it allows to add, delete, lookup items with higher throughput while achieving high table occupancy. Especially, Figure 5 shows the throughput with workload of 60% contain, 20% insert, 20% remove. In Gatekeeper setting, it requires more insert operations while less remove operations in the worst case.

mengxiang0811 avatar Nov 25 '19 17:11 mengxiang0811

Currently, Gatekeeper's DPDK adds rte_hash_lookup_bulk_with_hash(), rte_hash_prefetch_buckets(), and rte_hash_prefetch_buckets_non_temporal(). If we implement a flow table independently of DPDK, we can avoid having a custom version of DPDK.

AltraMayor avatar Jun 15 '22 17:06 AltraMayor

The function that compares keys in the hash table should receive a (void * arg) to simplify its implementation. The reason is that in order to compute the RSS hash, one needs the RSS hash key. See gk/main.c:rss_ip_flow_hf() for more information.

AltraMayor avatar Oct 28 '22 18:10 AltraMayor

Pull request #664 addressed this issue.

AltraMayor avatar Jan 30 '24 22:01 AltraMayor