server
server copied to clipboard
Consider using msgspec library for structured message serialization
Today I stumbled upon this library for doing json encoding/decoding in Python. Rewriting our message handling to use it could have many benefits, most importantly making it easier to document what the message format actually looks like as each message would become a class with explicit fields.
The Library: https://github.com/jcrist/msgspec
Features we need that it has:
- Supports tagged unions. This is how our message format currently distinguishes between commands.
- Supports skipping serialization of default values.
We currently filter out
None
values in a few places by copying the message before serializing it. Having this feature built in could make the code simpler and improve performance. We would just need to see how we want to handle this with non-None default values. - Can encode into bytearrays: This can make appending the newline we need more efficient
Other features:
- Supports msgpack *Maybe a plus? I recall there was an attempt to use msgpack before because it can be a more efficient representation than JSON, however, using inefficient representations hasn't really been a problem for us in the past (see QDataStreamProtocol :P)
I think this is definitely a step up from what we currently have.
Although I personally have always like the idea of some cross language specification like google protobuf that provides a json or binary format. Mainly with the idea that it would help lift a lot of the burden off people who want to make implementations that interact with the server in their language of choice. If this has that capability then definitely. Of course just having a dedicated message package with the classes listed clearly with their fields already helps with that.
I know you have talked about being wary of any extra build steps which is understandable but figured I would throw it out there since this is labeled proposal.
msgpack was discussed ~2017 and we decided to not use it. JSON is simpler to read and process across plattforms (also with outlook for web, websocket, stomp). We are not in a situation where every byte counts.
After messing around with this library for a bit I ran into a few things that made me not want to use it:
- It's a pretty small project. Not a lot of developers or PR's or issues on the repo.
- It's not stable. Still in 0.x version numbers means it could be changed a lot or contain many bugs.
- I ran into some random hard crashes. There were a few times when the server crashed with an error message from
realloc
, so probably there is a critical bug in the library. I was unable to figure out how to reproduce it though, and it only happened a few times. - It's written in C. The core functionality is all hand-rolled in C to make it fast. However, I looked at the code and there are gotos everywhere and the JSON serialization is totally custom written by the library author. I'd be a lot more comfortable with the project if it was for example written in safe Rust using the well tested serde_json library.
- It's lacking some features that I'd want. This could potentially be resolved by making issues or contributing back to the library, but given my previous point, I don't see myself doing this anyway. (I'd like to be able to have optional structured deserialization where it would default back to the standard json library behavior if no matching struct was found, and I'd like to be able to access the value of the
tag
parameter for tagged union structs after they've been created)