activitypub
activitypub copied to clipboard
[Feature] Support Protobuf or Flatbuffers communication for performance
Hello, since most of the ActivityPub users are instances from volunteers around the world, I think we should support communication with Protobuf or Flatbuffers. Choose one.
This way, system and server used resources are lowered and the services will be faster.
Protobuf more developer friendly and Flatbuffers the extremely fast choice.
Thanks.
I imagine the major headache is compatibility with ActivityStreams.
How does one keep JSON-LD's RDF schema compatibility with a protobuf/flatbuffer schema? Does protobuf/flatbuffer already support JSON-LD schemas? Is it technically feasible/tractable? Or would this wire-optimization sacrifice come at the cost of payload interoperability (as in, require ditching ActivityStreams)?
does protobuf even provide any benefit over http compression for average as2 payloads? I would imagine that the copious amount of string properties required would negate the majority of the gains you'd get.
furthermore, do we have any evidence that JSON serialization is the main performance bottleneck in a typical implementation? I would imagine optimising database access or fetching dependent resources to be way more important.
On Fri, Feb 21, 2020, 5:48 AM Cory J Slep [email protected] wrote:
I imagine the major headache is compatibility with ActivityStreams.
How does one keep JSON-LD's RDF schema compatibility with a protobuf/flatbuffer schema? Does protobuf/flatbuffer already support JSON-LD schemas? Is it technically feasible/tractable? Or would this wire-optimization sacrifice come at the cost of payload interoperability (as in, require ditching ActivityStreams)?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/activitypub/issues/346?email_source=notifications&email_token=AABZCV2TRMGCGQVCRGQPI33RD6WQPA5CNFSM4KYUBMX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMSJYYY#issuecomment-589601891, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZCVYIRUODTEVVKCIEIC3RD6WQPANCNFSM4KYUBMXQ .
Hi, the benefits about Protobuf or Flatbuffers are not just making smaller messages on the wire but also serialization performance. Both of them improve it a lot, flatbuffers even better because it can unserialize data with "zero-copy" access.
Now thinking about our use case Flatbuffer zerocopy will improve almost nothing because we usually need to parse the full message and the messages are "small", not MB. So I prefer Protobuf now.
Of course there are more performance bottlenecks, like the language implementation or databases. Maybe the best would be Rust and Scylla (Cassandra but with C++, not Java). But this is another talk, we are talking about the protocol itself which recommends a serialization technology, in this case, JSON.
@cypherbits I think you're missing the point. A good implementation of protobuf would (probably) be faster, but it comes at a price. Implementation complexity, more required dependencies, and a more complex compatibility/testing matrix just off the top of my head. So the big question is whether the benefits of protobuf outweigh the costs. Making things faster has a very, very marginal benefit if they're already fast enough, and there's little evidence (at least to my knowledge) that that's not the case.
I want to really emphasize how complex of a problem the consequences of "use protoc" (protobuf) creates. It goes beyond the optimization discussion.
This stems from the fact that the protobuf schema-defining language is not as powerful nor as expressive as the schema that defines JSON-LD, and subsequently ActivityStreams. Since the protobuf language and protoc tool only allow a subset of schema expressiveness, not only is this not "just an optimization plug-and-play with content negotiation", this would require duplicating the ActivityStreams work but for the protobuf definitions. Note that the idea of sharing proto descriptors doesn't overcome this fundamental limitation.
If protobufs are accepted and separate schemas for them are maintained independently from ActvityStreams, so they have "same concepts" but "totally different schemas", then compatibility is going to take a huge hit: folks that want to use {existing RDF, new} types will have to decide which to use. That alone means they will not be kept in sync, a reflection of the community. ActivityStreams might be able to use a new type, and the creator simply didn't add it to the protobuf schema. Or vice versa. Schema de-sync.
Then, implementors will look at both and have to pick one to start implementing, and then decide whether to implement the other one. So there will be implementations that either support ActivityStreams, or "ProtoStreams", but not both. Now we have a compatibility problem in the ActivityPub community across different software: both from people that create schemas, and by people that simply want to build software.
I want to call this out because I really think that byte-wire optimizations and language-specific "zero buffer" copy semantics are not worth community fragmentation.
My background: I actually wrote a "protoc"-like tool for go-fed (astool) in order to solve the "statically typed language" problem only. It reads in an OWL definition (very flexible) and spits out generated Go code, complete with static types. But, importantly, it maintains byte-wire compatibility (JSON)! And since it relies on OWL I do have to maintain a separate parallel schema from the real ActivityStreams definitions, but it's so flexible to be a matter of trivial effort. So it federates readily and doesn't require duplicating labor to keep in sync if, say, ValueFlows or another RDF ontology joins the fray. So I did evaluate protobuf (the schema defining language) and protoc (the tool) before, and found it insufficient.
Given what I know of protocol buffers and JSON-LD, I don't think this would ever be possible given the extensibility requirements of JSON-LD. Maybe it'd be possible to use something schema-less like messagepack, but even then, just standard HTTP based compression is likely already getting to similar levels for most activities.
There are certainly JSON parsing libraries that can do zero-copy parsing and other efficiencies, too.
This is an interesting topic, for a number of reasons.
It's probably good for our standard to have some flexibility and upgradeability for other lower-level protocols. I'd compare using WebSocket instead of plain old REST for delivery and submission. We don't have easy ways to do that right now. It is probably worthwhile for us to stimulate some experimentation on the topic, as long as it's backwards compatible.
However, the protobuf structure requires not just a change in the delivery substrate, but also a fixed schema for content, which is a big conceptual jump from the loosey-goosey world of JSON-LD. So this particular application would require even more hard thinking. @dmitrizagidulin recommends looking at https://linkml.io/ as a way to have a common source for JSON-LD and protobuf, which is honestly pretty cool.
I believe the next best step would be developing a FEP and getting some implementations.
Barring new suggestions, I think the next step with this is to make a FEP, so I will close the issue.
Worth noting that a solution like CBOR may be a better fit for a JSON-based vocabulary like Activity Streams 2.0.