protobuf
protobuf copied to clipboard
document that decoded byte slice shares underlying array with buffer
If you decode a struct which contains a slice of bytes, the returned slice shares the same underlying array with the buffer passed to Decode().
The consequences are:
- If the buffer is later overwritten, the contents of the decoded slice change (see demonstration below).
- If a (potentially large) buffer is no longer in use, the memory underlying array can't be freed as long as there is a reference to the decoded slice of bytes.
I think these consequences aren't intended. It seems better for Decode() to return a copy of the slice of bytes.
package main
import "go.dedis.ch/protobuf"
import "fmt"
func main() {
type foo struct{ Bar []byte }
buf, _ := protobuf.Encode(&foo{[]byte{44, 45}})
var decoded foo
protobuf.Decode(buf, &decoded)
fmt.Println(decoded) // {[44 45]}
buf[len(buf)-1] = 99
fmt.Println(decoded) // {[44 99]}
}
Can you confirm which version you are using? Because we saw a problem related to this before, and wrote TestByteOverwrite to diagnose and fix it in commit 9b0c08d0a258cc23ff5d51ff91c029d7532f5d7c.
The design is that the returned structures point to the data in the underlying buffer. If that does not work for you, you'll need to copy the bytes before you feed them to protobuf.Decode.
Thank you for your response.
As you mentioned, this behaviour is by design. I would argue that a slightly different design is better, with the bytes being copied into a new array (as is already done for non-byte slices). This is for two reasons:
- The (potentially large) buffer passed to protobuf.Decode cannot be freed as long as the (potentially much smaller) decoded slice is still in use. This applies even if you copy the buffer beforehand. So the current behaviour can result in a much larger memory footprint than necessary.
- Since, as far as I can tell, no other returned types point back to the original buffer (including integer slices etc.), the current behaviour can be a surprise for users of the library.
This behavior has caused many headaches for me and my friends working on the DSE Homeworks and it took a long time to track down to exactly this issue - because of the memory aliasing some byte[] that I assumed are immutable become changed unexpectedly much later.
Even if this is by design, it would be good to emphasize this somewhere in the documentation (as currently I didn't find any information about it), because it's not intuitive as @LukasGelbmann mentioned.