intelmq
intelmq copied to clipboard
ENH: Using msgpack instead of json
NOTE: This is a proof of concept. Being heavily tested!
Introduction
Msgpack ( MessagePack ) is a (de)serialization format, which is similar to json, but more optimized for m2m ( Machine-to-Machine ) communication. For sure there are better protocols like protobuf, flatbuffers, capnproto, SBE and so on, but this doenst fit in intelmq very well. Msgpack uses a key-value pattern ( like in json ), so there wont be any major change. The real "magic" happens how the data is being stored, JSON is very human-readable due to its serialization, but msgpack packs data into binary format, which results in smaller size & faster processing - see the benchmark below. If you want to know some specs, check it out here.
Msgpack itself is available for multiple languages like golang, python, javascript, php and so on.
In addition, Redis - our internal message queue - is also capable of using msgpack within its lua api.
Whats the goal?
- [x] Faster process time for (de)serialization.
- [x] less memory footprint
- [x] no breaking change
Benchmark
For the benchmark, data was extracted from spamhaus-drop-collector, parsed by spamhaus-drop-parser and measured in deduplicator-expert. 460 events were processed in total.
I've tested the bots above, they worked fine with that change, it might break other bots ( which I havent tested yet )
Type | Median data size |
---|---|
JSON | 387 bytes |
MSGPACK | 329 bytes |
Diff | 58 bytes ( 16,20% ) |
Serialize
Type | Median execution time in ns |
---|---|
JSON | 39286 |
MSGPACK | 23483 |
Diff | 15803 ( 50,35% ) |
Deserialize
Type | Median execution time in ns |
---|---|
JSON | 23483 |
MSGPACK | 12602 |
Diff | 10881 ( 80,62% ) |
To sum up, changing from json to msgpack will result in a faster (de)serialization and a lower memory footprint.