zerocopy icon indicating copy to clipboard operation
zerocopy copied to clipboard

Add `FromBytes`/`IntoBytes` methods which read from/write to an `io::Read`/`io::Write`

Open joshlf opened this issue 2 years ago • 11 comments

Progress

  • [ ] Add FromBytes::read_from_io and IntoBytes::write_to_io in 0.8
  • [ ] In 0.9, perform the following renames:
    • IntoBytes::write_to -> write_to_bytes (to be consistent with FromBytes::read_from_bytes)
    • Rename FromBytes::read_from_io -> read_from and IntoBytes::write_to_io -> write_to
    • Any similar rename for TryFromBytes method if #2619 (or an equivalent PR) lands

Original text

Crosvm has a utility function called zerocopy_from_reader:

pub fn zerocopy_from_reader<R: io::Read, T: FromBytes>(mut read: R) -> io::Result<T> {
    // Allocate on the stack via `MaybeUninit` to ensure proper alignment.
    let mut out = MaybeUninit::zeroed();

    // Safe because the pointer is valid and points to `size_of::<T>()` bytes of zeroes,
    // which is a properly initialized value for `u8`.
    let buf = unsafe { from_raw_parts_mut(out.as_mut_ptr() as *mut u8, size_of::<T>()) };
    read.read_exact(buf)?;

    // Safe because any bit pattern is considered a valid value for `T`.
    Ok(unsafe { out.assume_init() })
}

Maybe we should add something similar to FromBytes (and potentially a write analogue to AsBytes)?

joshlf avatar Feb 16 '23 20:02 joshlf

@nwplayer123 has also requested this feature.

joshlf avatar Sep 18 '24 15:09 joshlf

The obvious name, IMO, for this is read_from, but that name is currently squatted by a #[doc(hidden)] and #[deprecated].

Since it's marked #[doc(hidden)], it doesn't have any SemVer obligations, but in the interest of providing a smooth transition for users, perhaps we wait a release or two before revising it?

If this is the course we take, we should update its deprecation message to something like this:

#[deprecated(since = "0.8.0", note = "renamed to `FromBytes::read_from_bytes`. Migrate immediately. This item will soon be re-used for a different routine.")]

jswrenn avatar Sep 18 '24 16:09 jswrenn

Not sure what to call the _from_prefix variant, since read_from_prefix is already used, too (except for a visible, not deprecated function).

jswrenn avatar Sep 18 '24 16:09 jswrenn

A few considerations:

  • I don't mind read_from_io_read or similar. It's weird, but also consistent with our naming convention of read_from_<place>
  • I don't think we need prefix/suffix variants since a reader vends single bytes at a time, and so it can always give us exactly size_of::<Self>() bytes
  • We will want a read_from_with_elems for Self: ?Sized + KnownLayout
  • We may also want an analogous IntoBytes::write_to method. In that case, perhaps we should rename our existing write_to to write_to_bytes for consistency?

joshlf avatar Sep 18 '24 16:09 joshlf

  • I don't think we need prefix/suffix variants since a reader vends single bytes at a time, and so it can always give us exactly size_of::<Self>() bytes

There's an oddity about Read here in that it provides read_to_end so it's quite possible to distinguish between the read-all and read-prefix cases.

jswrenn avatar Sep 18 '24 16:09 jswrenn

IMO it's fine to provide the minimal API and let users manually construct fancier use cases. All of this can be built safely using our existing primitives anyway.

joshlf avatar Sep 18 '24 16:09 joshlf

  • I don't mind read_from_io_read or similar. It's weird, but also consistent with our naming convention of read_from_<place>

Throwing out the option of read_from_stream, since the Read trait embodies a stream of bytes that you can just pull from instead of a discrete byte slice.

NWPlayer123 avatar Sep 19 '24 03:09 NWPlayer123

It would be helpful to consider TryFromBytes here too.

For example, I have this enum that represents compression:

#[repr(u8)]
pub enum Compression {
    Stored,
    Zlib,
    BZip2,
    LZMA1,
    LZMA2,
}

Currently, I use FromRepr from strum to get an enum from the byte:

let compression = reader.read_u8()?;
header.compression = Compression::from_repr(compression)
    .ok_or_else(|| eyre!("Unexpected compression value: {compression}"))?;

Now that we have TryFromBytes, I could use that instead. However, as I can't just zero the type and write data to it like you could do when using FromBytes, it requires me to read the data into a buffer, and then use TryFromBytes:

let mut buf = [0; size_of::<Compression>()];
reader.read_exact(&mut buf)?;
Compression::try_read_from_bytes(&buf)

This is now several lines every time I want to read an enum. It's difficult to make this generic as the below function fails to compile with constant expression depends on a generic parameter (@joshlf pointed out #248 here). Using vec![] works but then it's not allocated on the stack.

fn enum_value<T: KnownLayout + TryFromBytes + Sized>(reader: &mut impl Read) -> Result<T> {
    let mut buf = [0; size_of::<T>()];
    reader.read_exact(&mut buf)?;
    T::try_read_from_bytes(&buf)
}

I've also considered a macro, which does work, but it's not syntactically as nice as it could be.

macro_rules! enum_value {
    ($reader:expr, $ty:ty) => {{
        let mut buf = [0; size_of::<$ty>()];
        $reader.read_exact(&mut buf)?;
        <$ty>::try_read_from_bytes(&buf)
    }};
}

I've considered using try_transmute!() but it still requires me to read the correct size of the data and ultimately, it would be the easiest for me if there was a way to interpret bytes straight from the reader.

russellbanks avatar Oct 10 '24 15:10 russellbanks

More prior art: The ReadBytesExt and WriteBytesExt traits from the byteorder crate as mentioned in https://github.com/google/zerocopy/issues/438#issuecomment-2451702061.

joshlf avatar Nov 01 '24 20:11 joshlf

I find myself doing this recently to mimic read_from_io:

pub fn try_read_from_io<R>(mut src: R) -> io::Result<Self>
where
    Self: Sized,
    R: io::Read,
{
    let mut buf = [0; size_of::<Self>()];
    src.read_exact(&mut buf)?;
    Self::try_read_from_bytes(&buf)
        .map_err(|err| io::Error::new(io::ErrorKind::InvalidData, err.to_string()))
}

It has to be implemented for every struct I want to use it for though so I'd love to see a try_read_from_io function (or try_read_from if the names are going that way in 0.9).

russellbanks avatar Jul 02 '25 10:07 russellbanks

I've put up #2619, which is a draft implementation of TryFromBytes::try_read_from_io.

joshlf avatar Jul 02 '25 21:07 joshlf