lorawan-stack icon indicating copy to clipboard operation
lorawan-stack copied to clipboard

Rethink Gateway Accept Logic

Open htdvisser opened this issue 4 years ago • 0 comments

Summary

We may have to change the way the Gateway Server handles new gateway connections. We discussed this on Slack some time ago, and I also commented some of it in https://github.com/TheThingsIndustries/lorawan-stack/pull/1975#issuecomment-613356493, but I forgot to create an actual issue for it.

Why do we need this?

Because with the current implementation we may experience some performance issues.

What is already there? What do you see now?

The current implementation of the UDP frontend in the Gateway Server is very packet-oriented. This means that on every received UDP packet, it tries to find an existing connection for the EUI in the packet, or create a new connection otherwise. This could mean that on networks with require-registered-gateways, it would do a lookup (resulting in a NotFound) for every UDP packet. This could also mean that a single IP:port can send packets containing many different EUIs, and practically DoS the Gateway Server.

What is missing? What do you want to see?

The implementation of legitimate gateways running the UDP packet forwarder, is actually relatively "connectionful", which we can see from the fact that the source IP:port stays the same for long times. As a result, we don't actually need to try setting up a new connection whenever we receive a UDP packet with a different EUI. We only need to do so when we see an UDP packet from an IP:port that we haven't seen before.

If we do it this way, we can quite easily add some extra checks before connecting gateways such as limiting the total number of connections per IP, rate-limiting new connections per IP, or even queueing new connections per IP. We can also reduce the load on networks with require-registered-gateways, since we only have to lookup the gateway once when the "connection" starts instead of on every packet.

How do you propose to implement this?

  • Keep a map[RemoteAddr]*UDPConn instead of a map[EUI]*state
    • The lifetime is for as long as the connection doesn't time out, so we won't have to retry connecting a gateway on each UDP packet we receive.
    • The UDPConn has a buffered channel for raw UDP packets, so we have "natural" rate limiting and flood protection.
  • When the GS receives a UDP packet for an existing connection (it's in the map), it just writes the packet to the connection's channel for raw UDP packets.
    • After writing the entry to the channel, the GS can immediately continue reading the next UDP packet.
    • This should be done with a select, since the connection's buffered channel may be full and we still want to keep handling other connections.
  • When the GS receives a UDP packet for a new connection (not yet in the map), create a new entry in the map, and add the UDPConn to some "accept queue"
    • After adding the entry to the accept queue, the GS can immediately continue reading the next UDP packet.
    • We may want to have a configurable "rate limit" on the accept queue, maybe even per IP address.
    • We may want to have a configurable number of "accept handlers".
    • The accept handler unmarshals the packet, checks if the gateway is allowed to connect with unauthenticated UDP, perhaps checks an IP whitelist, checks if there are any existing connections for the gateway (using ID claims) and then starts a "read" goroutine (depending on result):
    • If the connection is accepted, the read routine reads from the raw UDP packet channel, unmarshals+checks the packet, ACKs it, applies rate limits, ... Then it passes the LoRa uplinks on to the existing s.packetCh from where it's distributed to a number of packet handlers.
      • Added benefit: moving unmarshaling out of the "main" reader increases paralellism :rocket:
    • If the connection is rejected, the read routine drops all packets and perhaps retries the "connect" once in a while. In the future it could even add some firewall rule to (temporarily) block the conn at kernel level.
    • Both of these goroutines also have a timeout that cleans up the connection if it's inactive for too long.

htdvisser avatar May 07 '20 08:05 htdvisser