gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

100+ Gbps deployment

Open stefanfransen opened this issue 5 years ago • 5 comments

Hi,

We would like to test gatekeeper in our lab and were wondering about the scalabity of gatekeeper. Our production network has about 400 Gbps capacity and using 40 10Gbit ports would be quite unefficient imo.

So are there any tests done with 40 or 100 Gbit nic's? If not, we would like to be the first to test this.

Thank you. Stefan Fransen

stefanfransen avatar May 15 '19 10:05 stefanfransen

Hi Stefan,

You are absolutely right that using 40x 10Gbps ports would be quite inefficient. We don't have 40 or 100GBbps NICs available for our tests, so we cannot promise performance on them at this point. The deployment we are working on is with 10Gbps links on a single Internet exchange. We are being conservative with this deployment because, although modest, this is the very first production deployment of Gatekeeper. We expect this deployment will provide us with at least two things: testing Gatekeeper under production conditions, and prove that we can write a policy that meets production demands. This last mile has been taking a lot of effort since production is unforgiving on narrow details.

On scaling Gatekeeper, the general picture is as follows: packets are hashed (i.e. RSS) into lockless queues (i.e. NIC multiqueues) and a dedicated core is associated to each queue to serve it. Thus, Gatekeeper is supposed to scale well as long as the machine has enough cores and its bus can keep up with the storm of traffic. Of course, this is not the whole story and, when bus and cores are not the bottleneck, the most likely bottleneck is non-local memory access.

Gatekeeper is already aware of NUMA and the cache of the processors, and we have low hanging improvements available (see issue #36 for an example) to pursue. My educated guess is that these low hanging improvements can make Gatekeeper match 40Gbps NICs, and perhaps some more, but not 100Gbps NICs. To operate at 100Gbps and above, we will need to minimize non-local memory accesses further on the hot path and to implement a memory access pipeline using coroutines. These last improvements are a lot of work and we haven't started them.

Finally, I should highlight that the constraints mentioned above are concerned with a single machine. Any production deployment will have more than a machine for redundancy, connectivity demands, increasing memory pool, and sheer scalability. Gatekeeper servers currently load balance using ECMP and we have a not-implemented solution for deployments in which ECMP is not a reasonable solution. Thus, Gatekeeper would still match 400Gbps with a number of Gatekeeper servers.

If, after knowing where we are now, you still want to work us, let's meet. If you are based in Boston, we can meet in person; otherwise, we can start with a video conference.

AltraMayor avatar May 15 '19 14:05 AltraMayor

Hi,

Thanks for your detailed response. The use of multiple servers with 40Gbps NICs + ECMP is what I was thinking as well.

A video conference sounds good, I'm based in The Netherlands so don't think meeting in person is a option haha. What conference platform do you prefer?

stefanfransen avatar May 15 '19 15:05 stefanfransen

We are used to Google Hangouts. Reach me at [email removed] to schedule a meeting.

AltraMayor avatar May 15 '19 19:05 AltraMayor

My educated guess is that these low hanging improvements can make Gatekeeper match 40Gbps NICs, and perhaps some more, but not 100Gbps NICs. To operate at 100Gbps and above, we will need to minimize non-local memory accesses further on the hot path and to implement a memory access pipeline using coroutines

This is perfectly doable for 100G and more, considering DPDK and Snabb (Lua) apps performance. We run our DPDK-based (but not opensource) DDoS scrubbers at 100G speeds since 2015 to match core/uplink speeds. Batching and prefetching are the keys. :)

BTW, given current colo/crossconnection/transit cost structure it frequently pays off to have less network uplinks and PoPs but each one with greater capacity.

pawmal avatar Aug 22 '19 14:08 pawmal

Thank you for the information @pawmal.

Would you be willing to help us test Gatekeeper at 40Gbps or higher?

AltraMayor avatar Aug 22 '19 15:08 AltraMayor