nex-go icon indicating copy to clipboard operation
nex-go copied to clipboard

[Enhancement]: Separate PRUDP packet handling and RMC processing into separate threads

Open jonbarrow opened this issue 8 months ago • 0 comments

Checked Existing

  • [x] I have checked the repository for duplicate issues.

What enhancement would you like to see?

Currently the way we handle incoming packets is like so:

  1. UDP packet comes into nex-go
  2. A new goroutine is spawned for the packet and handleSocketMessage is called
  3. handleSocketMessage exits, closing it's goroutine, and then a 2nd goroutine is spawned for the packet calling processPacket
  4. Packet gets decoded and handled here, RMC parsed, etc.
  5. A message handler, such as the ones from nex-protocols-common-go is called for the packet
  6. The message handler returns an RMC message to nex-protocols-go
  7. nex-protocols-go then creates a new response PRUDP packet and sends it through nex-go

This works all well and good in concept, but has shown to have a couple of issues. One specifically is that we have, since basically the beginning, had an issue with lingering goroutines. These goroutines get parked and tend to never exit properly, which starts to balloon and cause issues for things like matchmaking (we have lingering dead clients which are no longer actually connected, resulting in p2p errors) and system performance (goroutines are not free and lingering ones continue to eat resources)

The obvious solution to this is to ensure that goroutines exit as fast as possible. In the current implementation, a goroutine is spawned the moment a packet is received, closed basically right away, a new goroutine is spawned, and then that goroutine doesn't get closed until it gets processed fully. This can have an impact in cases like long-running goroutines, such as those in complex methods like in matchmaking and DataStore

I'm sure the underlying cause for this in our implementation is just a bug somewhere (such as with packet retransmission and client references not being cleared, which will keep goroutines around as well), but there's other options we have to mitigate things while we work on that, which also bring a number of other benefits to the table

The solution/mitigation for this proposal is simple: stop spawning a single long running goroutine which can possibly hang forever. In order to do this effectively, we would need to split the processing of incoming PRUDP packets from their RMC processing. That way the PRUDP server can only focus about processing PRUDP packets and exit goroutines basically right as they are done being processed, the lifetime of the PRUDP goroutine would no longer be linked to the RMC method/handler

The idea is simple in concept:

  1. Create a message queue for incoming messages
  2. The PRUDP server will accept incoming PRUDP packets and process them only to the point of decoding the RMC payload
  3. The PRUDP server will push the packet/message into the queue
  4. A number of handlers/listeners (which are not goroutines, but real separate processes) will take packets/messages from the queue and process them
  5. The handler/listener will form the response packet and send it back to the main PRUDP server (likely also in a queue)
  6. The main PRUDP server will then send the response packet to the specified client

A design like this is what Nintendo also used (we presume). We know for a fact that NEX/RDV allowed handlers to be written in Python scripts, while the main server was written in C/++. This tells us that they, too, used a split architecture

Pros

This design allows the PRUDP server to only care about raw PRUDP packets, and not let goroutines pile up. All of the "hard" processing is done in the handler/listener processes, not affecting the server's ability to accept new packets. Another benefit to this type of design is that it, in theory, allows for more efficient scaling if need be. Multiple handler processes can be spawned (possibly even dynamically based on load) to handle larger or smaller amounts of traffic as we see fit, without needing to use multiple real servers or get stuck relying on goroutines. This should, in theory severely limit the number of parked goroutines we see

Cons

This will, unfortunately, likely severely complicate things and require redesigns at all levels of our stack. Currently the entire ecosystem we've built is designed on the assumption that a single process is being used, allowing us to take some shortcuts in places and reference data at multiple layers very easily (such as referencing the connected client, which is PRUDP-layer data, at the RMC handler layer, since it's the same packet/pointers being passed around everywhere). Splitting this up into multiple processes loses our ability to (trivially) share data in this way. There are a number of options for this, each with their pros and cons. Such as (but not limited to):

  1. Shared memory. This can work, but is complicated to set up and a bit annoying to work with. Though in terms of performance I think it would be the best?
  2. Passing data around through IPC. An internal gRPC connection between the main process and handlers, Unix domain sockets, etc. This might be a bit better in terms of DX than shared memory, but comes at a perf cost and means we have to serialize everything when passing it around which has its own set of caveats
  3. Rework how these layers work entirely to not rely on needing shared data at all. I can only assume this is how NEX/RDV was structured, but I have no idea how to model that right now or how the DX would look

We also would need to create a queue/packet passing system so that the main UDP server and handler processes can pass packets between each other. I believe either a light weight internal gRPC connection or using a shared Unix domain socket would be best here, but there may be a better/"more Go" way of doing this

Any other details to share? (OPTIONAL)

Related to:

  • https://github.com/PretendoNetwork/nex-go/issues/71
  • https://github.com/PretendoNetwork/nex-go/issues/73

jonbarrow avatar Mar 11 '25 16:03 jonbarrow