lightyear icon indicating copy to clipboard operation
lightyear copied to clipboard

Improve serialization

Open cBournhonesque opened this issue 11 months ago • 6 comments

There are a bunch of tradeoffs regarding serialization/deserialization.

The current strategy is:

  • to Encode, we maintain one Encode buffer per client. When we want to buffer a message to a single individual client, we start by encoding it as Bytes. When we are reading to send messages (every send_interval), we gather all the messages that we need to send in each channel. We provide their size to the priority manager that will only try to add the messages from highest priority to lowest priority until the bandwidth is full. (we know each message size because they've been encoded as bytes already). Then we pack those individual messages into Packets (with continue bits for channel changes: "are there more messages for this channel" and "is there another channel"?) of size 1200 bytes max
  • the reason why we want to limit the size of packets to 1200 bytes max (when we can; some individual messages might be of size bigger than 1200 bytes, for example if the replication_group is big) is because fragmented packets can introduce higher latency/packet-loss. If a single fragment is lost, the entire packet is lost!

We could switch to the next version of bitcode.

  • much faster, data is much more compressible (for example a Vec {a, b, c} will be encoded as aaaaabbbbbccccc instead of abcabcabc... ), no need for serialization hints, latest version.
  • how does it work? this time there is no bit-padding, instead each 'section' (for example aaaaa or bbbbb will be byte-padded). There is no easy way to check the number of bits used - potential strategy A:
    • we order the messages by order of priorities (highest priority first), we then try to encode messages[0..k], where we multiply k by 2, until we reach the rate limiter. (Then we can do binary search to find exactly how to stay under the rate limiter?) (do we run compression before the rate limiter?)
    • now we have identified the messages we want to send. We can:
        1. encode messages[0..n] directly, which gives us a Bytes. If it's too big, we split it into a FragmentedPacket. The problem is that now we don't have control over creating fragmented packets or not. But is fragmentation really a problem? TCP games work fine even with it.
        1. do another binary search strategy to serialize data up to packets of size MTU. Individual packets that are bigger than MTU might have to be handled separately. This time we can avoid fragmentation but there's some wasted CPU where we encode stuff multiple times. Do we run compression as well?
      • Maybe start with 1) first, because packets might very rarely exceed MTU size? Especially with bitcode's efficient compression. - potential strategy B:
    • we encode every message first individually and get the resulting size. The size of concatenating all messages individually will be much bigger than the final size of encoding messages[0..n] directly, because with multiple messages we get better padding, better compression, etc. We can then do everything similarly to what we do now.

cBournhonesque avatar Mar 16 '24 23:03 cBournhonesque