protobuf proto: add scatter/gather API for serialization

Serialized data read from the wire often spans several data frames. The current proto.Unmarshal API requires that data from all these frames be copied to a slice. This extra copying(and memory allocation) can be avoided if we have an unmarshal gather which can deserialize data from multiple byte slices.

If there's an agreement on the proposed API, I'd happy to work on it's implementation as well.

May 16 '18 23:05 MakMukhi

\cc @randall77 @LMMilewski

May 17 '18 04:05 dsnet

If we're going to go this route we should do UnmarshalStream(pb Message, r io.ReadSeeker). Then we could use a seeker version of io.MultiReader to combine byte streams.

(We need seeker to allow multi-pass algorithms.)

May 17 '18 13:05 randall77

I'm intentionally trying to stay away from the io.Reader API, since every read operation requires a buffer to be filled in by copying into it( Read(b []byte)).

My current idea is to take in this slice of slices and create a wrapper around it which provides slice like functionality that the implementation looks for.

For instance, decodeVarInt looks to read b[n], our wrapper data structure can have a method for that ReadByteAt(loc int) (byte, error).

This way we make minimal changes to the current implementation.

May 17 '18 14:05 MakMukhi

I should mention that the net package uses an io.Reader along with type assertions to achieve this behavior. See golang/go#17607 and net.Buffers.

May 17 '18 14:05 dsnet

That does look promising. We can leverage net.Buffers by giving its WriteTo method a custom io.Writer that returns the slice written on it to us. This does require changing the implementation a little more but not substantially. However, this makes me think of another solution which is to have our data structure just return the entire current slice and move to the next one. The implementation changes will be exactly the same as with net.Buffer without the added complexity.

May 17 '18 15:05 MakMukhi

Here's the wrapper data structure that I think can be hooked in the current implementation with very little friction:


package proto

import (
	"io"
	"unsafe"
)

// byteSlice wraps a slice of byte slices
// to provide slice like functions on it.
type byteSlice struct {
	ofst int
	bufs [][]byte
	size int
}

func newByteSlice(bufs ...[]byte) *byteSlice {
	bs := &byteSlice{
		bufs: bufs,
	}
	for _, b := range bufs {
		bs.size += len(b)
	}
	return bs
}

// peekByteAt reads a byte at loc. It returns an error
// if there's no more data. It doesn't move the ofst
func (s *byteSlice) peekByteAt(loc int) (byte, error) {
	if s.ofst+loc > s.size {
		return 0, io.ErrUnexpectedEOF
	}
	for _, b := range s.bufs {
		if len(b) > loc {
			return b[loc], nil
		}
		loc -= len(b)
		continue
	}
	panic("Incorrect implementation!")
}

// moveBy moves ofst by n.
func (s *byteSlice) moveBy(n int) error {
	if s.ofst+n > s.size {
		return io.ErrUnexpectedEOF
	}
	s.ofst += n
	for n > 0 {
		if len(s.bufs[0]) > n {
			s.bufs[0] = s.bufs[0][n:]
			return nil
		}
		n -= len(s.bufs[0])
		s.bufs = s.bufs[1:]
	}
	return nil
}

// Note: this allocates new memory and should be only
// used when assigning a byte slice to a proto.Message
// field.
func (s *byteSlice) readn(n int) ([]byte, error) {
	// We want a non-nil value returned when n is 0.
	if n == 0 {
		return emptyBuf[:], nil
	}
	if s.ofst+n > s.size {
		return nil, io.ErrUnexpectedEOF
	}
	// TODO(mmukhi): Evaluate variable usage to reduce
	// cache misses.
	s.ofst += n
	var p []byte
	for n > 0 {
		if len(s.bufs[0]) == 0 {
			s.bufs = s.bufs[1:]
			continue
		}
		sz := n
		if sz > len(s.bufs[0]) {
			sz = len(s.bufs[0])
		}
		// The use of append here is a trick which avoids the zeroing
		// that would be required if we used a make/copy pair.
		p = append(p, s.bufs[0][:sz]...)
		s.bufs[0] = s.bufs[0][sz:]
		n -= sz
	}
	return p, nil
}

// Note: this allocates new memory and should be only
// used when assigning a string to a proto.Message
// field.
func (s *byteSlice) readString(n int) (string, error) {
	b, err := s.readn(n)
	if err != nil {
		return "", err
	}
	// This trick is to prevent extra memory allocation made by
	// casting a byte slice to string.
	return *(*string)((unsafe.Pointer)(&b)), nil
}

func (s *byteSlice) length() int {
	return s.size - s.ofst
}

func (s *byteSlice) isEmpty() bool {
	return s.ofst >= s.size
}

May 17 '18 22:05 MakMukhi

@MakMukhi would you know how much this change would improve grpc benchmarks?

May 17 '18 22:05 LMMilewski

This will cut down memory foot-print for each RPC call(request-response) by half.

May 17 '18 23:05 MakMukhi

I think this is needed in go. And if you implement this look at the new c++ parser loop. The new parser is quite a bit faster and becomes very flexible due to resumability. In go this can probably be easily and elegantly implemented with go-routines, removing the necessity of a side stack.

Aug 27 '18 20:08 gerben-s

goroutines are certainly cheap, but not so cheap that you would want to spawn one for every unmarshal operation and deal with the synchronization.

In Go, the standard library only has support for writev-like functionality, but readv support is still open (see #17607). Personally, I am hesitant to see this functionality in protobufs unless it is symmetrical (supports both marshal and unmarshal). Furthermore, I kind of want to see how #17607 plays out with regards to readv first as I have a suspicion that will have some affect on the API design here.

Sep 10 '18 21:09 dsnet

I think supporting io.Reader io.Writer API is needed，the tensorflow module size is about 1-2G, so i need 5G memory to load the module。

Dec 19 '18 07:12 nomadli

I'm intentionally trying to stay away from the io.Reader API, since every read operation requires a buffer to be filled in by copying into it( Read(b []byte)).

Looking back on this -- I think it's possible Reader/Writer can be used efficiently if done the right way, even though we initially believed they could not. To me, the intuitive way would be an API like this:

func UnmarshalFrom(io.Reader, proto.Message) error

However, this results in an extra, unnecessary copy as the Reader implementation copies its data into the buffer passed to it for the protobuf library to consume.

If we were to flip that around, we could avoid that extra copy. Something like:

type Unmarshaller struct {}

func (u *Unmarshaller) Write([]byte) (int, error) {}
func (u *Unmarshaller) Message() (Message, error) {} // or maybe "Close"
   // and the Unmarshaller wraps a proto.Message which is valid after
   // Close returns

Here the implementation of Write would consume the bytes, only copying to maintain partial state dependent on subsequent data. This means the provider of the data could pass its buffer to the protobuf library, and no additional copy would be required.

Vice-versa, a marshaller could provide an io.Reader and the proto library would serialize directly into the user's buffer(s), avoiding an unnecessary copy on the way out.

@dsnet - what do you think?

Dec 20 '18 21:12 dfawley

@dfawley does it looks like C++'s zero copy stream? https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.zero_copy_stream

Jul 27 '19 08:07 yetingsky

protobuf protobuf copied to clipboard

proto: add scatter/gather API for serialization

protobuf
protobuf copied to clipboard