Improve nested Lazy buffering
Right now, each Lazy / Blob has its own Vec<u8> in which it stored the raw data. This means that e.g. code section has its own Vec<u8> that covers the whole code section, and then, once parsed, it also contains a bunch of function bodies, where each of them is also a Blob containing individual Vec<u8> covering corresponding body.
Since those nested vectors are parts of the upper vector, we should be using some more efficient representation that would allow them to share the backing buffer (e.g. Rc<[u8]> or similar).
Same goes for user-facing byte-based nested types like Vec<u8> and String - those will need to be changed to something else with shared backing buffer but without downgrading convenience of use.
But another problem is, we're reading from a general std::io::Read and we'll need some sort of specialization mechanism to know when we're actually reading from an existing byte buffer and could share the bytes instead of doing streaming reading again. Rust currently doesn't have specialization support on stable, but perhaps a dynamic hack could work?
For the record, another angle I had on my mind worth exploring here: positioned-io or olio or similar crates.
In particular, positioned-io has a Slice type that, combined with some dyn, might be useful as a storage representing a slice of the original data source.
Experimented with this in the bytes branch by using https://github.com/tokio-rs/bytes - which, in theory, should be much cheaper thanks to sharing data - but at least performance-wise it's much worse, seeing ~15% regression in the bench_parse_vec benchmark.
Going to leave as-is for now.