Double-copy merging String and Vec<u8> from inputs that are not Bytes
While measuring allocations in a performance-sensitive project, I found that the zero-copy Bytes trick in https://github.com/tokio-rs/prost/pull/449 actually increases the number of heap allocations and copies if neither the input buffer nor the field to be merged is Bytes.
Observations from my testing:
(1) If the input and output are both Bytes, then buf.copy_to_bytes() is zero-copy, and then <Bytes as BytesAdapter>::replace_with() is also zero-copy. This is the ideal case made possible by the above PR.
(2) If the input is not Bytes, then buf.copy_to_bytes() allocates a new intermediate Bytes and copies into it.
(3) If the output is not Bytes, then <Vec<u8> as BytesAdapter>::replace_with() will allocate a new vector and copy out of the intermediate Bytes.
The problem is that both (2) and (3) can easily occur in the same merge call. For example, prost-build generates Vec<u8> and String fields which trigger (3), and many users would attempt to parse from &[u8] or even a dynamic reader which is not contiguous, triggering (2).
I found the issue because I was following the general idiom of using &[u8] as my input type to keep the API simple and not expose dependencies, but measuring allocations and CPU time benchmarking both proved that this introduced substantial costs compared to Bytes.
I worry that many other users may also be triggering this case without knowing. Even if you had a way to explicitly tell users about it, I do not think it would be sufficient to require users to always use Bytes. If the other APIs in their program (especially externally-facing ones) constrain them to require other types, then this would just move the extra copies into their code instead of actually avoiding them where possible.
I believe it is easy to fix this for String because its merge implementation is neatly encapsulated in encoding::string::merge(). I am about to send a PR with a conservative fix for just this.
It would also be good to fix this for Vec<u8>, but I currently do not see a way to do that without changes to the external API of prost to make a distiction between merging from Bytes and merging from other types of buffers. As explicitly mentioned in earlier versions of encoding.rs, generic specialization would be a cleaner way to do this transparently, so it might make sense just to leave a comment and wait for that.
Thanks for digging into this.
I wonder if we should just make Bytes the default?