Binary Protocols for disk storage and websocket
Why
Right now, the websocket protocol and file persistence use JSON; this is fine for most use cases, but one big bottleneck we noticed in the Hydra Doom project was that this quickly becomes a problem. In just a few days, the nodes had produced over 10 terabytes of on-disk state, and the added JSON overhead inflated our ~480 byte transactions to several kilobytes; at 200 transactions per second per node, that is a significant amount of overhead.
What
It would be nice/convenient to have binary protocols for both of these things for scenarios that need to squeeze that much more out of the performance of the hydra node.
How
I'm not sure how this interplays with the plans to potentially use something like a postgres backend for persistence, but the source/sink APIs we recently contributed would be well suited for providing alternative implementations of these protocols.
(Perhaps this should have been a discussion before an issue, woops)
@Quantumplation That's fine. Thanks for contributing this idea. The purpose is clear: "Reduce JSON overhead".
What we should do about it is a bit less so. I currently see at least one drawback: switching from human-readable form to a binary encoding has the drawback of being impossible to debug without additional tooling.
Maybe an alternative way to reach the same purpose (at least one step) would be to only store the event stream (state file) and have the API outputs be an interpretation/subset of those. We should track this in an alternative idea (with a similar purpose) though.
Following up on this, now we've reduced the memory overhead of running a node, I think this issue can just focus on condense options for API interaction.
I note we could just do CBOR-style encoding, if we wished, which we already have a fair bit of, or we could do much more work and use gRPC/protobufs.
Do you have thoughts on what might be preferred? Or should we aim to go with all three, and leave it up to the consumer to decide?