intelmq icon indicating copy to clipboard operation
intelmq copied to clipboard

ENH: Using msgpack instead of json

Open waldbauer-certat opened this issue 3 years ago • 0 comments

NOTE: This is a proof of concept. Being heavily tested!

Introduction

Msgpack ( MessagePack ) is a (de)serialization format, which is similar to json, but more optimized for m2m ( Machine-to-Machine ) communication. For sure there are better protocols like protobuf, flatbuffers, capnproto, SBE and so on, but this doenst fit in intelmq very well. Msgpack uses a key-value pattern ( like in json ), so there wont be any major change. The real "magic" happens how the data is being stored, JSON is very human-readable due to its serialization, but msgpack packs data into binary format, which results in smaller size & faster processing - see the benchmark below. If you want to know some specs, check it out here.

Msgpack itself is available for multiple languages like golang, python, javascript, php and so on.

In addition, Redis - our internal message queue - is also capable of using msgpack within its lua api.

Whats the goal?

  • [x] Faster process time for (de)serialization.
  • [x] less memory footprint
  • [x] no breaking change

Benchmark

For the benchmark, data was extracted from spamhaus-drop-collector, parsed by spamhaus-drop-parser and measured in deduplicator-expert. 460 events were processed in total.

I've tested the bots above, they worked fine with that change, it might break other bots ( which I havent tested yet )

Type Median data size
JSON 387 bytes
MSGPACK 329 bytes
Diff 58 bytes ( 16,20% )

Serialize

Type Median execution time in ns
JSON 39286
MSGPACK 23483
Diff 15803 ( 50,35% )

Deserialize

Type Median execution time in ns
JSON 23483
MSGPACK 12602
Diff 10881 ( 80,62% )

To sum up, changing from json to msgpack will result in a faster (de)serialization and a lower memory footprint.

waldbauer-certat avatar Mar 17 '21 13:03 waldbauer-certat