socketioxide
socketioxide copied to clipboard
Refactor socketioxide parsing
Motivation
Currently, the serialization/deserialization system of socketioxide works as follows:
Deserialization
- First, the packet is parsed, and a payload is extracted.
- This payload (
[event, ...data]) is immediately parsed into a dynamic value (serde_json::Value). We do this to:- Read the event string and route it to the correct handler.
- If the data consists of a variadic number of arguments, convert it into a [
serde_json::Value::Array] and skip the first element (the event). - If the data is a single value, extract it from the array.
- The payload is then deserialized into the user-provided type before calling the handler.
Serialization
- The data provided by the user is serialized into a
serde_json::Value. - If the data is an array, we insert the event name at the front of the array (shifting all the elements).
- If the data is a single value, we wrap the provided value in an array with the event at the front.
Drawbacks
This approach is the simplest way to handle the data payload, but it has several drawbacks:
- We deserialize the data into a dynamic type, so everything is allocated on the heap.
- Handling dynamic types is more difficult with multiple parsers.
- We can't distinguish between a
Vecemitted as a single value and a variadic number of arguments (#225). - We can't handle complex binary payloads. With socket.io, we should be able to include a binary object anywhere in the structure, but currently, it's only possible to send it at the end of the top-level array (#275).
Solution
First, the codebase is split into multiple crates:
- The
socketioxide_corecrate contains all the core types used by the other crates (mainly the parser implementations):Packet,Parse,Sid,Str. - The
parser_commoncrate contains all the parsing code for the default parser. - The
parser_msgpackcrate contains all the parsing code for the new msgpack parser. - The
socketioxidecrate contains the rest of the codebase.
The core crate defines a Parse trait that all parsers must implement. The Parse design focuses on deferred parsing and keeping things as simple as possible. Here is the new serde flow with this solution:
Deserialization
- The packet is parsed, and the entire payload is retained as-is (no deserialization).
- When socketioxide needs to call the correct handler, it calls the
read_eventmethod to simply retrieve a string reference to the event in the payload. - Using a custom
serdeimplementation, we check if the user-provided value is a tuple based on theserdemodel. If it is, we deserialize it directly. Otherwise, we deserialize the first element of the array (solves #225). - The payload is then deserialized into the user-provided value. Both parser implementations (common and msgpack) contain custom implementations of
serde::{Serialize, Deserialize}, which wrapserde_jsonandrmp_serde. These wrappers handle the following:- For the common parser only, it reinjects all binary data into the appropriate fields (solves #275).
- For all parsers, it skips the first element of the root-level array since that is the event field.
Serialization
- We directly encode the user-provided value and the event using the given parser. If the value is tuple-like, we serialize it as multiple arguments (with the event at the front of the argument list) (solves #225).
- For the common parser, if any binary payloads are included, we extract them into a separate array and replace the binaries with placeholders (solves #275).
- We then include the pre-serialized payload in the packet that is being serialized.
Drawbacks of this solution
- For the common parser, binary payloads are currently cloned during both serialization and deserialization because the
serdemodel usesVec<u8>rather thanBytesor other structs that are cheap to clone. This is the primary issue with the current solution, though it only affectsserde_json;msgpackhandles binary data natively. - It is not possible to deserialize/serialize an unknown variadic number of arguments. You must know the number of arguments you are sending and receiving, whereas in JavaScript you can emit any unknown number of arguments (e.g.,
socket.emit(...new Array(Math.random(1000)))).
To Do
- [ ] Add CI/CD pipelines for msgpack parser
- [ ] Add documentation
- [ ] Test and document behavior for using
serde_json::Valuewhen deserializing payloads with binaries - [x] Add more unit testing for parsers (
read_event, ...) - [x] Add fuzzy testing ?
- [x] Add benchmarks for the msgpack parser.
- [x] Add a feature flag for msgpack parser
Closes :
- #275
- #276 (Bin payload is removed)
- #225
- #234