jsonbinpack-poc icon indicating copy to clipboard operation
jsonbinpack-poc copied to clipboard

CPU time?

Open zellyn opened this issue 2 years ago • 4 comments

Anecdotally, I've heard of servers spending significant portions of their CPU on serialization and deserialization. It's true that the network is much slower, but it's still a concern. I'm curious how your format stacks up against protobuf, etc. as well as against the newer, very fast JSON libraries.

zellyn avatar Mar 25 '22 15:03 zellyn

One of our engineers commented:

Some of these benchmarks are really tough to make representative. For example, some benchmarks test going from an object model to a JSON string, but in practice it’ll always be bytes ultimately on the wire. This misses the cost of UTF-8 encoding and allocations, which tend to dominate real-world performance

zellyn avatar Mar 25 '22 20:03 zellyn

Hey @zellyn !

The idea of JSON BinPack and its strong focus on space-efficiency was originated in the context of IoT, where companies may deploy devices to extremely remote locations, sending data back to the cloud through very expensive and slow cellular connections. In those cases, the cost of data transmission + the data transmission speed make things like CPU utilization irrelevant in comparison.

However, as mentioned in https://github.com/jviotti/jsonbinpack/issues/164#issuecomment-1079527311, I am focusing on making JSON BinPack both space-efficient and fast. The pre-production JSON BinPack implementation from this repo is probably slower than mature Protocol Buffers implementations, but I hope for that to change once I finalize the C++ production-ready JSON BinPack implementation.

Some of these benchmarks are really tough to make representative. For example, some benchmarks test going from an object model to a JSON string, but in practice it’ll always be bytes ultimately on the wire. This misses the cost of UTF-8 encoding and allocations, which tend to dominate real-world performance

That benchmark is exclusively measuring space-efficiency (converting the document into an array of bytes). So it does miss i.e. UTF-8 allocations by design! I do expect to produce runtime-efficiency benchmarks soon to close that gap, but I figured it was too soon to do based on this experimental implementation!

jviotti avatar Mar 26 '22 00:03 jviotti

That makes sense. Presumably, JSON BinPack would always be turned back into a JSON representation before use? I imagine something like Cap'n Proto or Flatbuffers might have an advantage of being usable in-place (at least if read-only) that would be nice for highly constrained environments. Or is JSON BinPack usable in-place? A quick glance at the documentation markdown leads me to believe the compressed format would be traversable, albeit inconvenient.

zellyn avatar Mar 26 '22 02:03 zellyn

The existing set of rules are indeed not optimized at all for that use case. The architecture of JSON BinPack has been envisioned as a "framework" to define static analysis rules, mapping rules and encoding rules, while remaining agnostic about what such rules actually accomplish. In this initial implementation, I defined many rules that are specifically applicable to space-efficiency.

However, I'd like to explore writing other complemental set of rules targeted at making the bit-strings traversable, etc. I was thinking about defining a small custom JSON Schema vocabulary that would then allow you hint JSON BinPack to optimize certain sub-schemas for certain use cases. For example, you would be able to tell: encode this sub-schema for runtime-efficiency (enabling in-place access) while encoder this other part focused on space-efficiency.

So the quick answer is: this is not supported yet, but hopefully it will, as the architecture was designed to accommodate for that too!

jviotti avatar Mar 26 '22 12:03 jviotti