socketioxide icon indicating copy to clipboard operation
socketioxide copied to clipboard

Refactor socketioxide parsing

Open Totodore opened this issue 1 year ago • 0 comments

Motivation

Currently, the serialization/deserialization system of socketioxide works as follows:

Deserialization

  • First, the packet is parsed, and a payload is extracted.
  • This payload ([event, ...data]) is immediately parsed into a dynamic value (serde_json::Value). We do this to:
    • Read the event string and route it to the correct handler.
    • If the data consists of a variadic number of arguments, convert it into a [serde_json::Value::Array] and skip the first element (the event).
    • If the data is a single value, extract it from the array.
  • The payload is then deserialized into the user-provided type before calling the handler.

Serialization

  • The data provided by the user is serialized into a serde_json::Value.
  • If the data is an array, we insert the event name at the front of the array (shifting all the elements).
  • If the data is a single value, we wrap the provided value in an array with the event at the front.

Drawbacks

This approach is the simplest way to handle the data payload, but it has several drawbacks:

  • We deserialize the data into a dynamic type, so everything is allocated on the heap.
  • Handling dynamic types is more difficult with multiple parsers.
  • We can't distinguish between a Vec emitted as a single value and a variadic number of arguments (#225).
  • We can't handle complex binary payloads. With socket.io, we should be able to include a binary object anywhere in the structure, but currently, it's only possible to send it at the end of the top-level array (#275).

Solution

First, the codebase is split into multiple crates:

  • The socketioxide_core crate contains all the core types used by the other crates (mainly the parser implementations): Packet, Parse, Sid, Str.
  • The parser_common crate contains all the parsing code for the default parser.
  • The parser_msgpack crate contains all the parsing code for the new msgpack parser.
  • The socketioxide crate contains the rest of the codebase.

The core crate defines a Parse trait that all parsers must implement. The Parse design focuses on deferred parsing and keeping things as simple as possible. Here is the new serde flow with this solution:

Deserialization

  • The packet is parsed, and the entire payload is retained as-is (no deserialization).
  • When socketioxide needs to call the correct handler, it calls the read_event method to simply retrieve a string reference to the event in the payload.
  • Using a custom serde implementation, we check if the user-provided value is a tuple based on the serde model. If it is, we deserialize it directly. Otherwise, we deserialize the first element of the array (solves #225).
  • The payload is then deserialized into the user-provided value. Both parser implementations (common and msgpack) contain custom implementations of serde::{Serialize, Deserialize}, which wrap serde_json and rmp_serde. These wrappers handle the following:
    • For the common parser only, it reinjects all binary data into the appropriate fields (solves #275).
    • For all parsers, it skips the first element of the root-level array since that is the event field.

Serialization

  • We directly encode the user-provided value and the event using the given parser. If the value is tuple-like, we serialize it as multiple arguments (with the event at the front of the argument list) (solves #225).
  • For the common parser, if any binary payloads are included, we extract them into a separate array and replace the binaries with placeholders (solves #275).
  • We then include the pre-serialized payload in the packet that is being serialized.

Drawbacks of this solution

  • For the common parser, binary payloads are currently cloned during both serialization and deserialization because the serde model uses Vec<u8> rather than Bytes or other structs that are cheap to clone. This is the primary issue with the current solution, though it only affects serde_json; msgpack handles binary data natively.
  • It is not possible to deserialize/serialize an unknown variadic number of arguments. You must know the number of arguments you are sending and receiving, whereas in JavaScript you can emit any unknown number of arguments (e.g., socket.emit(...new Array(Math.random(1000)))).

To Do

  • [ ] Add CI/CD pipelines for msgpack parser
  • [ ] Add documentation
  • [ ] Test and document behavior for using serde_json::Value when deserializing payloads with binaries
  • [x] Add more unit testing for parsers (read_event, ...)
  • [x] Add fuzzy testing ?
  • [x] Add benchmarks for the msgpack parser.
  • [x] Add a feature flag for msgpack parser

Closes :

  • #275
  • #276 (Bin payload is removed)
  • #225
  • #234

Totodore avatar Sep 29 '24 22:09 Totodore