protobuf Opensource C++ zero-copy API

Protobuf has zero-copy support to avoid copying string/bytes fields when parsing protobuf messages and it's used pretty much everywhere inside Google, but the feature has never made its way into the opensource repo. Now protobuf 3.0.0 is released and we will probably have more time to look into incremental improvements. The zero-copy API is a good candidate to be included in the next 3.x release.

Opensourcing the zero-copy API will involve:

opensource related string/buffer classes (Cord and its dependencies).
un-exclude zero-copy APIs from the message interface (such as ParseFromStringWithAliasing).
un-exclude the support for ctype = STRING_PIECE and ctype = CORD.

(1) is probably the most difficult part as that's a large chunk of code and it may not be portable.

Jul 29 '16 23:07 xfxyjwf

Any updates on this feature?

Feb 15 '17 17:02 jjyao

@jjyao This unfortunately hasn't made into our agenda yet. If this feature is useful to you, can you post here your use case and estimate how much it can help? More concrete use case example can help us prioritize it.

Feb 15 '17 19:02 xfxyjwf

I'm also quite interested in this feature.

More concrete use case example can help us prioritize it.

@xfxyjwf I'm writing an application-specific database server with gRPC and RocksDB. I want to:

Accept serialized protos from clients through gRPC and store them in RocksDB verbatim, without parsing and constructing a full object in memory.
Retrieve serialized protos from RocksDB and send them back to clients without parsing and reserializing them, ideally as part of another proto, which would serialize only what's necessary.

I want this because parsing and serialization currently take ~30% of my total response time and I don't really need them.

Here's a flame graph profile that shows what I'm seeing.

Jun 24 '17 01:06 bobobo1618

Is this the thing that cap'n proto does that makes it fast than protobuf?

Jul 02 '17 09:07 stellanhaglund

@stellanhaglund No, it's not the main cause of the performance difference. cap'n proto is very similar to FlatBuffer and what I described in https://github.com/google/protobuf/issues/3296 can be said to cap'n proto as well.

Jul 05 '17 17:07 xfxyjwf

I am very interested in this feature. I have been suggesting at my work that we adopt something like protobuf for a long time. One of the major push backs has been the ability to zero copy large binary/string values. This is because we have many applications where an extra copy or two of the data means the processors/memory bus is now saturated.

Our usual process stream for data look a lot like:

DMA from network interface to shared memory
pass off the shared memory reference to the process(es) to do calculations
calculations done from shared memory to shared memory
pass shared memory reference to further process(es)
DMA out of the machine

Control message and meta data are small enough that copying is no problem (and in fact encoding as json etc. is usually good enough). Typical data is large matrices (think 16+MB) of complex integer (often 16 bit), complex IEEE binary16 (half) or complex IEEE binary32 (float). While meta data may be 64 bytes in total encoded as a struct. Note we often also have the requirement that the data be machine vector aligned (typically 32 byte align). A "slow" data rate is 3-5 Gigabit/s.

It'd be great if we could encode such data as something like protobuf and not have to manually maintain readers and writers and representations in multiple languages. We are already making an effort to use protobuf for control data, which IMO it already excels at.

Sep 26 '17 15:09 johnfb

Perhaps Cord will / could be open sourced as part of the Abseil library. The initial release doesn't include it, although there is a passing mention in malloc_extension.h.

Sep 28 '17 09:09 arthur-tacca

@arthur-tacca Yep. The Cord type will be included as part of Abseil. And after we migrate to use Abseil, supporting zero-copy ctypes should be straightforward.

Sep 28 '17 17:09 xfxyjwf

Hello, I ran into some performance problems at my previous HFT job an thought it be nice to have a zero-copy, heap free, protobuf parser.

If I were going to hand write code that parsed a specific protobuf schema, I'd typically do all my processing on the stack and consume all data in one pass.

I could see writing a C++ functional template heavy low level decoder giving me the same performance. I would best describe it as X(name proposals welcome) is to SAX as regular protobuf bindings are to DOM.

instead of heaping a std::string, you get a std::string_view.
if you don't care about a field you should be able to say so at compile time
if you don't want to parse a subobject you should be able to skip it or even break out of parsing

On the generation side I could see doing something something similar.

it should be possible for a message routing app to pass payloads without decoding them

Is there interest for this kind of thing? My fear is that C++ guys that really care about performance would avoid protobuf anyway. I guess my target audience are skilled C++ devs worried about performance forced to speak protobuf for historical reasons or a contract with outside components.

Does anyone have a spiffy name?

Has anyone seen something like this? I found lots of alternative wire formats with language bindings: SBE, CapNProto, etc.

Apr 24 '18 12:04 chris-hite

I don't think we would switch to using a SAX-like parser except maybe in some very specific circumstances in our project. For us, the overhead of most PBs is negligible (and I expect to be even lower when we switch to using arenas). The main exception is the std::string allocation of lots of tiny strings -- we're stuck on the pre-C++11 ABI, so every string ends up being a heap allocation/free pair.

Apr 24 '18 16:04 toddlipcon

I can't use this library without that feature at all!

I use an arena, because I store sensitive key material in my protobuf messages and I provided an allocator with safe memory to the arena (sodium_malloc, not swapped out, zeroed out on free, guard pages etc.).

Given that the key material is stored in bytes fields, protobuf allocates them on the heap in std::string and completely bypasses the safe memory that I want the keys to reside in.

I already halfway ported my code from protobuf-c to protobuf, only now finding out that all my key material completely bypasses the arena. So now it seems like I have to throw that away and stick with protobuf-c (which makes me really unhappy).

May 12 '18 16:05 FSMaxB

Any updates on this feature?

Jun 12 '18 07:06 tianyapiaozi

I think string_view should be a solid contender to be fully released soon.

Cord's are a thoroughly more heavy weight type. Integrating ZCIS with Cord's ties our most basic library directly into ABSL. We thread a little more carefully here.

Aug 06 '18 20:08 gerben-s

@gerben-s Could you please elaborate what ZCIS/Cord and StringView are with respect to zero-copy?

Aug 07 '18 17:08 MyUmmaGumma

Zero copy parsing of strings can be achieved by aliasing string_view's or Cord's with the underlying buffer. Cord is a heavy weight type from the absl lib, which needs to be directly supported by our ZeroCopyInputStream (ZCIS) abstraction.

Aug 20 '18 20:08 gerben-s

@FSMaxB On the level of safety I understand your wishes, but its hard for us to make any such guarantee about not storing memory on the heap.

If you have such stringent security demands, I think C++ protobuf is not the right fit.

We are thinking about how to expose aliasing but we want to be careful and expose the right API.

Sep 10 '18 20:09 gerben-s

We are thinking about how to expose aliasing but we want to be careful and expose the right API.

That makes sense, especially without std::string_view/std::span

Sep 11 '18 16:09 FSMaxB

The only way to go for Google would probably be abseil, but that doesn't go well with semantic versioning.

Sep 11 '18 16:09 FSMaxB

My project, https://github.com/google/nucleus, would very much like this feature.

Nucleus is a package for reading and writing genomics data. It relies on another package called htslib to parse some of the more complicated formats like VCF. Unfortunately, htslib insists on putting the parsed data into memory structures it allocates itself, which leaves us the task of copying that data into protocol buffers.

On benchmarks reading a 100M gz-compressed VCF file, this extra copying causes our C++ reader to be almost twice as slow (20 seconds vs. 11 seconds) as another open-source package for reading VCF files (that doesn't rely on protocol buffers and can thus use the htslib allocated memory directly).

Jan 04 '19 20:01 ThomasColthurst

https://groups.google.com/forum/#!topic/abseil-io/JzrwSIE_ZSo

With Cord coming soon hopefully this can be unblocked.

Jan 04 '19 20:01 prem-nuro

Any update on addition of std::string_view/zero-copy support? That would be really useful. I have a client sending data-buffers to a server via gRPC, and right now i can send data only as string/byte fields in protobufs. The client keeps the data-buffers around, until the rpc is successfully completed implying server has received the data-buffer. It will be great to have 'string_view' support in protobufs, so that client doesn't have to make 'string-copies' of these buffers. The buffers are atleast few MBs, data-transfer throughput has to be reduced to take into account memory-overhead of this copy.

May 27 '19 05:05 msn-tldr

@msn-tldr If you use the gRPC async API on the client side then you don't need to keep the request object in memory while the call is in process.

May 27 '19 21:05 arthur-tacca

@arthur-tacca I am referring this async-client example code, this. Do you mean to say at after line 72(PrepareAsyncSayHello call below), the request can be de-allocated( say if it was on heap)?

std::unique_ptr<ClientAsyncResponseReader<HelloReply> > rpc( stub_->PrepareAsyncSayHello(&context, request, &cq));

Even if you mean this then also, wouldn't grpc keep a copy of this request internally( including the data-buffer )? Then i still have 2 copies of the buffer, one with gRPC and one with the client-app, so it can retry the buffer, if write fails.

May 29 '19 06:05 msn-tldr

@msn-tldr Regarding freeing the request object: That is correct. Indeed I asked for an almost identical issue to be clarified in the docs in grpc/grpc.github.io#774

Regarding copying to a buffer: That is correct, and is the nature of protocol buffers. When you want to serialise them, they are serialised to a buffer that includes a complete copy of any bytes objects contained within them.

Another important point: If you have a std::string object and you want to set a member of a protocol buffer object to it, you can use move semantics to avoid a copy i.e. my_protobuf.set_myfield(std::move(my_string_obj)). Of course if you want to set the field to a substring of an existing buffer then this won't help.

(More detail than you probably need/want: technically it is possible to serialise a protocol buffer object to a stream, which means its members are not necessarily copied into a single buffer all at once, but eventually every individual byte will still be copied. Besides, I imagine the synchronous gRPC API probably serialises to a single buffer at once, and the async gRPC API almost certainly does. If allocating such big buffers is a problem, the usual recommendation with gRPC is to break the bytes objects into chunks and using a streaming request to send them. You could use FlatBuffers with gRPC which would get you zero-copy when reading (the main subject of this issue), but there would still be some copying e.g. when gRPC makes the system calls that send/receive data. TensorFlow uses gRPC but does some other special tricks to avoid copying tensor data around too much, but I don't believe they are available for use outside that library.)

May 29 '19 21:05 arthur-tacca

@arthur-tacca thanks this, it helped.

Jun 09 '19 08:06 msn-tldr

Is there any update on this feature? This would be very useful. Now that absl has a string_view implementation (https://github.com/abseil/abseil-cpp/blob/master/absl/strings/string_view.h) it seems like that could be used :)

Nov 18 '19 22:11 pcmoritz

absl::Cord has just been released: https://github.com/abseil/abseil-cpp/commit/3c814105108680997d0821077694f663693b5382

Feb 19 '20 02:02 prem-nuro

I think there are two requests here:

(a) Allow ctype = STRING_PIECE (b) Allow ctype = CORD

The original comment says "(1) opensource ... Cord and its dependencies ... is probably the most difficult part" But surely that's only needed for ctype = CORD? For ctype = STRING_PIECE, a vendored copy of StringPiece has been included in open-source protobuf for years. That leaves items (2) and (3) in the original comment (un-excluding the relevant code from the open-source release). This might be a lot less work than releasing the full feature including CORD, assuming (2) and (3) can reasonably be done for STRING_PIECE without also doing them for CORD.

The ctype = STRING_PIECE feature solves the zero-copy problem in the case the string you want refer to without copying is contiguous, which is probably enough functionality for many people (e.g. me 😃). So perhaps, rather than waiting for some solution involving cord, just the string piece functionality could be open sourced?

I thought this was already well understood, but reading through the comments it seems it hasn't been mentioned here before. The comments mostly discuss alternative types such as std::string_view and absl::Cord, but there's been no mention of protobuf::StringPiece.

Jun 03 '20 11:06 arthur-tacca

std::string_view has made our StringPiece type obsolete, so I don't think we want to expose StringPiece publicly in any more places if we can avoid it. Eventually we will likely want to replace it with std::string_view. The main problem is that to get access to std::string_view, we need to require C++17 (currently we only require C++11). The other possibility is to depend on ABSL and use absl::string_view, but that would be a non-trivial change as well.

Jun 03 '20 12:06 acozzette

That makes sense, thanks.

Jun 03 '20 12:06 arthur-tacca