rasn icon indicating copy to clipboard operation
rasn copied to clipboard

add feature to interop with `bytes` crate, add `encode_buf` and `decode_buf`?

Open shahn opened this issue 1 year ago • 7 comments

I guess a lot of people are working with the bytes crate when using tokio and network protocols. It would be nice if rasn could work directly with the relevant types, with an API similar to https://docs.rs/tokio/latest/tokio/io/trait.AsyncReadExt.html#method.read_buf (so instead of encode you'd call encode_buf), to allow re-using buffers.

shahn avatar Sep 13 '24 18:09 shahn

Thank you for your issue! However we already do support Bytes, we just have it aliased as OctetString.

XAMPPRocky avatar Sep 14 '24 11:09 XAMPPRocky

Hi XAMPPRocky, thanks for your quick reply! I am not sure if maybe there's a misunderstanding: It seems you're using the OctetString as an alias for bytes::Bytes, but not to decode from/encode into, but rather to support the ASN1 data type. As far as I see, right now one would decode from a &[u8] and encode into a Vec, rather than using the bytes crate for that.

shahn avatar Sep 14 '24 12:09 shahn

I have a working branch which introduces new lifetime type for Encoder and allows reusing the same buffer recursively quite much with the new AnyEncoder associated type, without Rc or RefCell. It works with plain &mut Vec<u8> type, for OER and maybe for PER too with .view_bits_mut::<Msb0>(); (haven't tested yet).

From performance wise, it would also make sense to introduce encode_buf method since it can more than half the required allocations if the same buffer can be reused. However, I am not sure if this type could be Bytes, at least internally, since it is much slower. Generic type internally would also make things complicated. Would there be some other approach?

Nicceboy avatar Oct 30 '24 16:10 Nicceboy

Interesting, do you know why Bytes is so much slower? Perhaps that is something that also would be of interest to the tokio folks, since I assume they care about Bytes performance a lot

shahn avatar Oct 30 '24 16:10 shahn

This might a bit unfair comparision since Vec might be as fast as it gets, if you don't need some features of Bytes type. Bytes type has an advantage that you can split or slice the buffer without any new allocations (just by getting new "view" to the same buffer), so if you need to do that often, it might be faster in the end. It uses reference counters, and internal structure is more complicated, so just adding data to the buffer is slower in general, and the tradeoff comes here. It also allows shared ownership and zero-copy cloning.

There are actually few cases where I haven't tested yet where that splitting could be valuable and maybe the performance difference is not that clear. In general, there are many cases where you need to store the encoded values in separate buffer before you can calculate the length, and splitting can be useful if it saves some allocations or moving/cloning. But even if it would be possible to use Bytes in OER, for example , PER has a high dependency of BitVec so that change is unlikely and would require major rework and encode_buf should work somewhat similarly for all codecs.

Of course encode_buf could be also implemented "inefficiently" so that the caller provides the buffer but the codec does not use it internally and just moves the data in there in the end.

Nicceboy avatar Oct 30 '24 17:10 Nicceboy

I mean, I think the last option is not really what I was after, the goal was to reduce allocations for sure :) I use Bytes a lot for receiving data from the network and then working with it, to avoid allocations (I had some trouble with memory fragmentation in a different project, so kind of defaulting to it). But even providing a Vec which can be re-used would be helpful for that.

Thanks for educating me on the intricacies of different encodings, I am pretty new to ASN.1.

shahn avatar Oct 30 '24 19:10 shahn

Just a note, OER has now encode_buf for &mut Vec<u8> type for starters. There is internally second buffer for length calculations but other than that, allocations are now almost at theoretical minimum. Traits should support adding the same for other codecs too.

Nicceboy avatar Nov 27 '24 05:11 Nicceboy