protobuf
protobuf copied to clipboard
proto: add scatter/gather API for serialization
Serialized data read from the wire often spans several data frames. The current proto.Unmarshal
API requires that data from all these frames be copied to a slice.
This extra copying(and memory allocation) can be avoided if we have an unmarshal gather which can deserialize data from multiple byte slices.
If there's an agreement on the proposed API, I'd happy to work on it's implementation as well.
\cc @randall77 @LMMilewski
If we're going to go this route we should do UnmarshalStream(pb Message, r io.ReadSeeker)
. Then we could use a seeker version of io.MultiReader
to combine byte streams.
(We need seeker to allow multi-pass algorithms.)
I'm intentionally trying to stay away from the io.Reader
API, since every read operation requires a buffer to be filled in by copying into it( Read(b []byte)
).
My current idea is to take in this slice of slices and create a wrapper around it which provides slice like functionality that the implementation looks for.
For instance, decodeVarInt
looks to read b[n]
, our wrapper data structure can have a method for that ReadByteAt(loc int) (byte, error)
.
This way we make minimal changes to the current implementation.
I should mention that the net
package uses an io.Reader
along with type assertions to achieve this behavior. See golang/go#17607 and net.Buffers
.
That does look promising. We can leverage net.Buffers
by giving its WriteTo
method a custom io.Writer
that returns the slice written on it to us. This does require changing the implementation a little more but not substantially.
However, this makes me think of another solution which is to have our data structure just return the entire current slice and move to the next one. The implementation changes will be exactly the same as with net.Buffer
without the added complexity.
Here's the wrapper data structure that I think can be hooked in the current implementation with very little friction:
package proto
import (
"io"
"unsafe"
)
// byteSlice wraps a slice of byte slices
// to provide slice like functions on it.
type byteSlice struct {
ofst int
bufs [][]byte
size int
}
func newByteSlice(bufs ...[]byte) *byteSlice {
bs := &byteSlice{
bufs: bufs,
}
for _, b := range bufs {
bs.size += len(b)
}
return bs
}
// peekByteAt reads a byte at loc. It returns an error
// if there's no more data. It doesn't move the ofst
func (s *byteSlice) peekByteAt(loc int) (byte, error) {
if s.ofst+loc > s.size {
return 0, io.ErrUnexpectedEOF
}
for _, b := range s.bufs {
if len(b) > loc {
return b[loc], nil
}
loc -= len(b)
continue
}
panic("Incorrect implementation!")
}
// moveBy moves ofst by n.
func (s *byteSlice) moveBy(n int) error {
if s.ofst+n > s.size {
return io.ErrUnexpectedEOF
}
s.ofst += n
for n > 0 {
if len(s.bufs[0]) > n {
s.bufs[0] = s.bufs[0][n:]
return nil
}
n -= len(s.bufs[0])
s.bufs = s.bufs[1:]
}
return nil
}
// Note: this allocates new memory and should be only
// used when assigning a byte slice to a proto.Message
// field.
func (s *byteSlice) readn(n int) ([]byte, error) {
// We want a non-nil value returned when n is 0.
if n == 0 {
return emptyBuf[:], nil
}
if s.ofst+n > s.size {
return nil, io.ErrUnexpectedEOF
}
// TODO(mmukhi): Evaluate variable usage to reduce
// cache misses.
s.ofst += n
var p []byte
for n > 0 {
if len(s.bufs[0]) == 0 {
s.bufs = s.bufs[1:]
continue
}
sz := n
if sz > len(s.bufs[0]) {
sz = len(s.bufs[0])
}
// The use of append here is a trick which avoids the zeroing
// that would be required if we used a make/copy pair.
p = append(p, s.bufs[0][:sz]...)
s.bufs[0] = s.bufs[0][sz:]
n -= sz
}
return p, nil
}
// Note: this allocates new memory and should be only
// used when assigning a string to a proto.Message
// field.
func (s *byteSlice) readString(n int) (string, error) {
b, err := s.readn(n)
if err != nil {
return "", err
}
// This trick is to prevent extra memory allocation made by
// casting a byte slice to string.
return *(*string)((unsafe.Pointer)(&b)), nil
}
func (s *byteSlice) length() int {
return s.size - s.ofst
}
func (s *byteSlice) isEmpty() bool {
return s.ofst >= s.size
}
@MakMukhi would you know how much this change would improve grpc benchmarks?
This will cut down memory foot-print for each RPC call(request-response) by half.
I think this is needed in go. And if you implement this look at the new c++ parser loop. The new parser is quite a bit faster and becomes very flexible due to resumability. In go this can probably be easily and elegantly implemented with go-routines, removing the necessity of a side stack.
goroutines are certainly cheap, but not so cheap that you would want to spawn one for every unmarshal operation and deal with the synchronization.
In Go, the standard library only has support for writev
-like functionality, but readv
support is still open (see #17607). Personally, I am hesitant to see this functionality in protobufs unless it is symmetrical (supports both marshal and unmarshal). Furthermore, I kind of want to see how #17607 plays out with regards to readv
first as I have a suspicion that will have some affect on the API design here.
I think supporting io.Reader io.Writer API is needed,the tensorflow module size is about 1-2G, so i need 5G memory to load the module。
I'm intentionally trying to stay away from the io.Reader API, since every read operation requires a buffer to be filled in by copying into it( Read(b []byte)).
Looking back on this -- I think it's possible Reader
/Writer
can be used efficiently if done the right way, even though we initially believed they could not. To me, the intuitive way would be an API like this:
func UnmarshalFrom(io.Reader, proto.Message) error
However, this results in an extra, unnecessary copy as the Reader
implementation copies its data into the buffer passed to it for the protobuf library to consume.
If we were to flip that around, we could avoid that extra copy. Something like:
type Unmarshaller struct {}
func (u *Unmarshaller) Write([]byte) (int, error) {}
func (u *Unmarshaller) Message() (Message, error) {} // or maybe "Close"
// and the Unmarshaller wraps a proto.Message which is valid after
// Close returns
Here the implementation of Write
would consume the bytes, only copying to maintain partial state dependent on subsequent data. This means the provider of the data could pass its buffer to the protobuf library, and no additional copy would be required.
Vice-versa, a marshaller could provide an io.Reader
and the proto library would serialize directly into the user's buffer(s), avoiding an unnecessary copy on the way out.
@dsnet - what do you think?
@dfawley does it looks like C++'s zero copy stream? https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.zero_copy_stream