bytes Request: Provide try_get

Request: Provide try_get_* methods

Open oblique opened this issue 5 years ago • 12 comments

Some network protocols have dynamic length packets, so we can not have a single buf.remaining() check before starting parsing them. In that case before get_* we need to check the remaining length to avoid assertions.

I believe try_get_* methods that do not assert they will help us write cleaner code.

Apr 14 '19 12:04 oblique

I agree this would be helpful to have. If you look at the rust-fuzz trophy case, many of the bugs in the oor category could have been prevented with this type of interface.

I think we need to answer a couple of questions, first.

`advance` and `copy_to_slice`

Ideally, we should include a non-panicking version advance and copy_to_slice. This would keep things consistent with not wanting to panic and require explicitly opting in to panicking, rather than opting out.

Change existing APIs

Would it be better, instead, to change the existing APIs to return Result<T, Error>?

Pros

Considering a lot of the use cases for Buf handle untrusted input, I think this is the best direction long term, as it forces implementations to consider the possibility of a EOF.
Adding a try_ method per T would essentially double the already-lengthy trait API and docs page.

It would still be possible to panic, if desired:

buf.get_u8().expect("i've already checked the length");

Compared to the amount of boilerplate it currently requires to not panic:

// this also isn't guaranteed to be the length of `T` - easy to drift from the actual impl
if buf.remaining() < 4 {
    return Err(EOF);
}
let v = buf.get_u32();

Cons

It would most likely break a lot of existing usage of Buf (but now would be the time to do it before 1.0)
Panicking would require an additional .unwrap()

Return type

In my mind, we have 3 options:

Option<T> This option is attractive, as it doesn't introduce any new public types. My biggest problem with it, though, is it becomes kind of tedious to use in a function that returns a Result:

fn decode<B: Buf>(buf: B) -> Result<(), MyError> {
    let a = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
    let b = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
    let c = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
    Ok(())
}

whereas it's a bit easier to deal with the inverse:

fn decode<B: Buf>(buf: B) -> Option<()> {
    let a = buf.try_get_u8().ok()?;
    let b = buf.try_get_u8().ok()?;
    let c = buf.try_get_u8().ok()?;
    Some(())
}

You could implement From<NoneError> for MyError but that could happen anywhere for various reasons.

Result<T, struct UnexpectedEnd> Assuming all of the methods only check that buf.remaining() > size_of::<T>() then this is a fine option. If at some point in the future, there's another error condition we want to signal, it would be a breaking change to use an enum instead.
Result<T, #[non_exhaustive] enum TryGetError> This is probably the most stable option long term, although may be overkill if it'll only ever be a single variant.

Oct 16 '20 20:10 camshaft

Result<T, struct UnexpectedEnd>

That's what I would lean towards. Since the APIs here are just about reading, I don't see a need for more error variants. Buf doesn't cover serialization/deserialization of higher level data structures which could produce additional errors. Code which does that can define their own errors, and implement From<UnexpectedEnd> for call to Buf

Oct 16 '20 23:10 Matthias247

Result<T, struct UnexpectedEnd>

We use this pattern heavily in quinn and it works great. We even have an extension trait for all Buf implementers to provide methods of that form for various integer types, plus other assorted trivially-represented stuff.

Would it be better, instead, to change the existing APIs to return Result<T, Error>?

I'm a fan of this approach, since it's overwhelmingly the way we use the Buf API.

Oct 17 '20 01:10 Ralith

After much bikeshedding in Discord, I'm leaning back towards providing try_get_* variants.

The reasoning is, while it makes a lot of sense for Buf to provide getters that return Result, it makes far less sense for BufMut to provide setters to return Result. The buffer is usually pre-allocated to contain the full message.

It also doesn't feel great for BufMut to provide setters that panic and Buf getters return Result. To maintain symmetry, Buf::get_[T] can panic and we can add Buf::try_get_[T] to return a result.

Oct 20 '20 17:10 carllerche

I'm also a big fan "no-panic by default". IMO, out-of-bounds panicking is one of the biggest design mistakes in std; similar to Java's NullPointerExceptions. There are situations where panicking is the only reasonable default to handle errors, but OOB is usually none of them.

As far as I can tell, bytes is often used in crates with highly untrusted input where you almost never want to panic just because some foreign input data is truncated. So the default API should be failsafe, because you can always still expect a value if appropriate – but explicit is better than implicit here.

A try_-API has two disadvantages:

First, it feels like it's a bit against Rust's design philosophy IMO? Currently most of Rust is like: "Ah nope, this is dangerous… You sure you wanna do this?", whereas an x and try_x dualism is more like: "Ok, will do… (Oh, and if you want me to perform some checks – tell me, ok?)". I'm a big fan of types like Option<T> or Result<T, E> for fallible operations, because they allow/force me to notice that something can go wrong here (saved my ass numerous times 😅).

Also in this case, try_get_*-APIs feel like 2nd-class citizens and are easy to forget – if you accidentally use get_* instead of try_get_*?, there is no compiler warning or lint etc., so it's easy to miss. Furthermore, there is already a get and get_mut-API for slices/sliceables which returns an Option<T>, and it's pretty easy to confuse it with get_* if you work with both types.

When it comes to the specific API, I'm in favor of returning an Option<T>:

It is consistent with std's slice::get and slice::get_mut.
Not having any bytes left may not even be an error (e.g. if I parse an indefinite list of concatenated u32s).
If it is an error, you'd probably want to use your own error type anyway, because the errors are usually context specific (i.e. in some parts I'd e.g. want to return InvalidHeader whereas in other parts I might want sth. like NeedMoreInput as error). So something like impl From<UnexpectedEnd> for MyError is not very useful except in some edge cases IMO?

Dec 20 '20 02:12 KizzyCode

As mentioned in discord, I think something like this could work pretty well: https://godbolt.org/z/hjYbeEf5P.

We could also do a blanket impl: impl<B: TryBuf> Buf for B { ... } and delegate calls with an expect(...). It does mean implementations would only be able to pick one trait to implement and if they've implemented Buf, they wouldn't be able to implement TryBuf, at least until specialization is stabilized.

Sep 20 '21 23:09 camshaft

in the meantime, this might do what you're looking for - https://github.com/danieleades/safer-bytes

Oct 19 '21 10:10 danieleades

@danburkert Your crate is exactly what I was looking for! Thank you so much for making it public.

i'm please to hear it might help. That inspired me to tidy it up a little

Nov 29 '21 17:11 danieleades

Currently I'm having to use two dependencies for working with bytes, bytes for writing and byteorder for reading, so I'm hoping this request gets fulfilled because it would simplify things :) byteorder returns io::Results containing ErrorKind::UnexpectedEof, which comes from Read::read_exact.

Mar 01 '22 03:03 ghost

Another implementation is

use bytes::Buf;
use std::mem::size_of();

#[derive(Clone, Copy, Debug)]
pub enum ErrorKind {
    EOF
}

pub trait TryBuf {
    fn try_get_u8(&mut self) -> Result<u8,ErrorKind>;
    // ...other try_get APIs
}

impl<T: Buf> TryBuf for T {
    fn try_get_u8(&mut self) -> Result<u8,ErrorKind> {
        if (size_of::<u8>() <= self.remaining()) {
            Ok(self.get_u8())
        } else {
            Err(ErrorKind)
        }
    }
    // ...other impls
}

This is efficient and will never break any existing usage of Buf.

I implement this in try_buf

Aug 13 '22 10:08 wheird-lee

The issue with first checking that it is enough, and then reading, is that the reading part can very easily be changed by yourself or someone without updating the above check, causing a possible panic point. It is also quite error prone and unclear by hand calculating the size, E.g; having a literal 4 or 32 and not knowing why, which will be outdated if someone comes aloing and changes a get_u8 to a get_u16 or similar.

In other words, panics become a surprise because some code changed further down, which goes against the general design of Rust of not upholding invariants yourself, E.g parse and not validate, or using types rather than checks to commicate impossibilites and guarantees.

Furthermore, the manual method makes it very hard to validate if the size depends on data that has already been read, like branching code, which requires making each "section" do its own validating and manual calculation of what the code already knowns and encode. Similar in a way to hand calculating stack offsets based on function bodies, which we all know how well that goes as son as there is a branch.

Though this is just my take on the matter when writing networking code and attempting to uphold and remember implicit assumptions as functions may panic otherwise with no clear indication that they will based on signature.

May 04 '23 13:05 ten3roberts

Hello, Pretty new to Rust and I just ran into this issue in late 2023; would be great to get these methods into the bytes crate. For now I've added the trait suggested by @wheird-lee into my project.

It's worked a treat, thanks 👍

Happy to help with this where I can

Oct 17 '23 21:10 tombanksme

bytes bytes copied to clipboard

Request: Provide try_get_* methods

advance and copy_to_slice

Change existing APIs

Pros

Cons

Return type

bytes
bytes copied to clipboard

`advance` and `copy_to_slice`