untrusted icon indicating copy to clipboard operation
untrusted copied to clipboard

Add more `read_*` methods for big/little endian integers

Open dcsommer opened this issue 4 years ago • 4 comments

A straightforward feature that would be quite useful for implementing network protocols would be additional methods on Reader for reading big and little endian integers. The API could be something like:

pub fn read_be<T>(&mut self) -> Result<T, EndOfInput>
pub fn read_le<T>(&mut self) -> Result<T, EndOfInput>

Where T is some integer type. This is similar to the C++ library Cursor, from Folly and fits nicely at the level of pulling bytes safely from the buffer.

dcsommer avatar Dec 23 '19 19:12 dcsommer

@dcsommer Borrowing this issue to advertise my crate which is extension to untrusted:

https://crates.io/crates/untrustended

It contains a trait to extend untrusted's Reader with many new methods. I'm using it for making protocol parsers and I'd like to get feedback also from others.

There's also PR #14 to implement more methods directly into untrusted, but it's been slowly cooking do to lack of time.

oherrala avatar Apr 30 '20 11:04 oherrala

@oherrala thanks for the crate link. I noticed untrustended reads byte by byte, whereas the PR reads all the needed bytes at once, so presumably the performance would be slightly better using the PR than the untrustended crate?

dcsommer avatar Apr 30 '20 15:04 dcsommer

@dcsommer Yeah, you are right. I think I benchmarked the PR to be faster than untrustended's way. There's no reason not to do the same thing with untrustended and if there's need to squeeze more speed out then I think it's easy to implement.

oherrala avatar May 03 '20 08:05 oherrala

I think I benchmarked the PR to be faster than untrustended's way. There's no reason not to do the same thing with untrustended and if there's need to squeeze more speed out then I think it's easy to implement.

I would hope that reading byte-by-byte wouldn't matter a huge amount. However, I imagine there is some performance impact and I think we should address that. Maybe we should have something like: fn read_bytes<T>() -> Result<&T, EndOfInput> where T could be an array of bytes [u8; 1], [u8; 2], etc. Then we could write:

let x = u64::from_be_bytes(*reader.read_bytes()?);

Where that would turn into basically a bounds check followed by a MOVBE instruction.

I think the above is not too terrible. I admit it isn't as nice as:

let x: u32 = reader.read_be()?;

Although it's true that there are a lot of data formats that are just big- or little- endian encodings of bytes, I almost never write parsers for those formats. Usually some kind of unusual variable-length encoding is used in the formats I'm interested in. For example, ASN.1, QUIC, UTF-8, HPACK. Definitely we wouldn't put the decoding of all those formats into untrusted, so it seems a little wrong to put the simple endian decoding into untrusted. However, maybe the simple endian encoding is just so common that lots of projects would immediately be able to delete a bunch of code if we added it to untrusted?

Further, what I've noticed in ring regarding simple endian-encoded values is that I usually want to read the next N bytes and get an &[u8; N] and then reinterpret those bytes as an &[[u8; 4]; N/4] and then copy those values into an (appropriately-aligned) array of BigEndian<u32>. Sometimes N is a constant so these are really Rust arrays (e.g. in almost every crypto part of ring except RSA), and other times N isn't a constant (e.g. RSA in ring) so the result would be a slice &[[u8; 4]]. Very rarely am I reading/writing single endian-encoded u32/u64/etc. values.

briansmith avatar Jan 26 '21 06:01 briansmith