libs-team
libs-team copied to clipboard
[ACP] IO traits in `core::io`
Proposal
Problem statement
Now that the basic types for cursor-based IO (BorrowedBuf
and BorrowedCursor
) are in core::io
, I'd like to propose adding a set of low-level I/O traits to core::io
that mimic std::io
, but use cursors into borrowed buffers and are parameterized by error type.
I expect this API is big enough that it'll need to be an RFC, but I figured I'd file an ACP first to see if there were any objections to the overall idea / API structure.
Motivating examples or use cases
Being able to write #![no_std]
libraries that do some sort of I/O is a widespread request. A partial list of use cases includes:
- Embedded systems that have some notion of sequential byte-based I/O via serial, i2c, RNG chips, etc.
- Libraries intended for these targets frequently use a hardware abstraction library such as https://github.com/rust-embedded/embedded-hal, but ideally they could be coded against a generic Read/Write interface so they're usable in
std
programs also.
- Libraries intended for these targets frequently use a hardware abstraction library such as https://github.com/rust-embedded/embedded-hal, but ideally they could be coded against a generic Read/Write interface so they're usable in
- Binaries that run under an OS in a constrained environment, such as Linux's early userspace, use
#![no_std]
to reduce binary size and have more predictable memory allocation patterns. They still need some way to abstract I/O over files, sockets, character devices (/dev/random
), etc. - Encode/decode libraries that want to incrementally parse input data or generate output data could be usable in
#![no_std]
builds, but today their streaming interfaces are often behind afeature = "std"
gate.
The proposed design may also be useful in code that uses std
but needs precise control over I/O patterns.
Solution sketch
// core::io
// The `IoBuf` and `IoBufMut` types are similar to `std::io::IoSlice{,Mut}`, but aren't guaranteed
// to have any particular internal layout. `IoBufMut` is also allowed to contain uninitialized excess
// capacity, making it similar to `Vec` (but with fixed maximum length).
//
// The goal is to enable fast paths for hosted targets, with slight opacity to enable `target_os = "none"`.
//
// Then within `std` there can be cheap conversions of &[IoBuf] / &[IoBufMut] <-> &[IoSlice] / &[IoSliceMut]
// which is safe because `std` knows which OS it's built for.
struct IoBuf<'a>;
struct IoBufMut<'a>; // currently named BorrowedBuf
impl<'a> From<&'a [u8]> for IoBuf<'a>;
impl<'a> From<&'a mut [u8]> for IoBufMut<'a>;
impl<'a> From<&'a mut [MaybeUninit<u8>]> for IoBufMut<'a>;
// core::io::core_io_traits (placerholder name: won't conflict with std::io and also very ugly)
// All traits in this module have a bound on IO, which associates an error type.
trait IO {
type Error;
}
// ReadCursor tracks how much data was read from a single read* call.
// ReadVecCursor is the same, but for vectored reads.
struct ReadCursor<'a>; // currently named BorrowedCursor
struct ReadVecCursor<'a>;
impl IoBufMut<'_> {
fn unfilled(&mut self) -> ReadCursor<'_>;
fn unfilled_vec<'a>(bufs: &'a [&'a mut IoBufMut<'_>]) -> ReadVecCursor<'a>;
}
// WriteVecCursor tracks how much data was written in a vectored write.
//
// See https://github.com/rust-lang/rust/issues/70436 for context on why
// this extra glue is useful.
struct WriteVecCursor<'a>;
impl IoBuf<'_> {
fn unwritten(&self) -> &[u8];
fn unwritten_vec<'a>(bufs: &'a [&'a IoBuf<'_>]) -> WriteVecCursor<'a>;
}
// Read traits
//
// Read is like std::io::Read, except read_exact() has no default implementation due to
// the opacity of error types.
//
// ReadAt is like (a subset of) std::os::unix::fs::FileExt, and represents types that can read
// at arbitrary offsets. It has a bound on `IO` rather than `Read` because such types don't
// necessarily have the concept of a "current position" (e.g. a block device, or a ROM chip).
//
// Read sizes are provided via the cursors, so all return types are `Result<(), Self::Error>`
trait Read: IO {
fn read(&mut self, cursor: &mut ReadCursor) -> Result<(), Self::Error>;
fn read_exact(&mut self, cursor: &mut ReadCursor) -> Result<(), Self::Error>;
fn read_vectored(&mut self, cursor: &mut ReadVecCursor) -> Result<(), Self::Error> { ... }
fn read_vectored_exact(&mut self, cursor: &mut ReadVecCursor) -> Result<(), Self::Error> { ... }
fn is_read_vectored(&self) -> bool { ... }
}
trait ReadAt: IO {
fn read_at(&mut self, cursor: &mut ReadCursor, offset: usize) -> Result<(), Self::Error>;
fn read_exact_at(&mut self, cursor: &mut ReadCursor, offset: usize) -> Result<(), Self::Error>;
fn read_vectored_at(&mut self, cursor: &mut ReadVecCursor, offset: usize) -> Result<(), Self::Error> { ... }
fn read_vectored_exact_at(&mut self, cursor: &mut ReadVecCursor, offset: usize) -> Result<(), Self::Error> { ... }
fn is_read_vectored_at(&self) -> bool { ... }
}
// Write traits -- basically the same story as the read traits, but there's
// no cursor for just plain writes.
pub trait Write: IO {
fn write(&mut self, buf: &[u8]) -> Result<usize, Self::Error>;
fn write_all(&mut self, buf: &[u8]) -> Result<(), Self::Error>;
fn write_vectored(&mut self, cursor: &mut WriteVecCursor) -> Result<(), Self::Error> { ... }
fn write_all_vectored(&mut self, cursor: &mut WriteVecCursor) -> Result<(), Self::Error> { ... }
fn is_write_vectored(&self) -> bool { ... }
fn flush(&mut self) -> Result<(), Self::Error>;
}
pub trait WriteAt: IO {
fn write_at(&mut self, buf: &[u8], offset: usize) -> Result<usize, Self::Error>;
fn write_all_at(&mut self, buf: &[u8], offset: usize) -> Result<(), Self::Error>;
fn write_vectored_at(&mut self, cursor: &mut WriteVecCursor, offset: usize) -> Result<(), Self::Error> { ... }
fn write_all_vectored_at(&mut self, cursor: &mut WriteVecCursor, offset: usize) -> Result<(), Self::Error> { ... }
fn is_write_vectored_at(&self) -> bool { ... }
}
// To provide somewhat friendlier UX, a wrapper struct can be used to provide various
// parameter formats. For example the `std::io::Read` trait methods `read` and `read_buf`
// could be represented 1:1 here, without expanding the trait's API.
pub struct Reader<'a, R>;
impl<'a, R> Reader<'a, R> {
pub fn new(r: &'a mut R) -> Reader<'a, R>;
}
impl<R: Read> Reader<'_, R> {
fn read(&mut self, buf: &mut [u8]) -> Result<usize, R::Error>;
fn read_exact(&mut self, buf: &mut [u8]) -> Result<(), R::Error>;
fn read_buf(&mut self, buf: &mut ReadBuf) -> Result<usize, R::Error>;
fn read_buf_exact(&mut self, buf: &mut ReadBuf) -> Result<(), R::Error>;
}
// Vectored I/O can also be made a bit more ergonomic by using const-sized
// arrays, which would make the traits non-object-safe.
impl<R: ReadVectored> Reader<'_, R> {
fn read_vectored<const N: usize>(&mut self, bufs: [&'a mut [u8]; N]) -> Result<usize, R::Error>;
fn read_buf_vectored(&mut self, buf: &mut ReadVecBuf) -> Result<usize, R::Error>;
// ...
}
// same for `struct Writer<'a, W>`
Alternatives
- Move
std::io
traits intocore
: https://github.com/rust-lang/rust/issues/48331- This approach is hard-blocked by the design of
std::io::Error
[0], and doesn't seem likely to land in the forseeable future.
- This approach is hard-blocked by the design of
- Do nothing, and hope for a third party to write a library of equivalent traits that can be broadly adopted by the community.
- Do nothing, and encourage authors of
#![no_std]
libraries to define their own I/O traits (leaving consumers to write the glue code). - Something smaller? Such as defining just the
Read
/Write
traits, in the idiom of Go, and leaving the scatter/gather IO to another day. - ?
[0] https://github.com/rust-lang/project-error-handling/issues/11
Links and related work
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
demo of moving std::io::Error
into core
: https://github.com/rust-lang/project-error-handling/issues/11#issuecomment-1761019324
I understand the desire to enforce a common error for types that implement Read
and Write
, but honestly, I don't think that the IO
trait justifies its existence. Honestly, the ability to separate out read and write errors feels like a feature, more than anything.
Assuming that we don't go the path of having a core::io::Error
, I would say that we should make Error
part of the Read
and Write
traits directly.
I understand the desire to enforce a common error for types that implement
Read
andWrite
, but honestly, I don't think that theIO
trait justifies its existence. Honestly, the ability to separate out read and write errors feels like a feature, more than anything.Assuming that we don't go the path of having a
core::io::Error
, I would say that we should makeError
part of theRead
andWrite
traits directly.
The problem with giving Read
and Write
their own error type is that it prevents trait composition. Consider types that can be both Read
and Write
, such as serial devices or network sockets.
trait Read {
type Error;
fn read(&mut self, buf: &mut [u8]) -> Result<usize, Self::Error>;
}
trait Write {
type Error;
fn write(&mut self, buf: &[u8]) -> Result<usize, Self::Error>;
}
trait Socket: Read + Write {
// error[E0221]: ambiguous associated type `Error` in bounds of `Self`
fn set_nodelay(&self, nodelay: bool) -> Result<(), Self::Error>;
}
// error[E0221]: ambiguous associated type `Error` in bounds of `S`
fn netboot<S: Socket>(tftp_socket: &mut S) -> Result<(), S::Error> { ... }
You end up having to give a new name to the error type at each level of trait definition, which gets out of hand quickly and causes diffs to get really unpleasant when refactoring.
trait Socket: Read<Error = Self::SocketError> + Write<Error = Self::SocketError> {
type SocketError;
}
fn netboot_v1<S, E>(tftp_socket: &mut S) -> Result<(), E>
where S: Read<Error = E> + Write<Error = E>,
{ ... }
// forced to change the error type for a minor refactoring
fn netboot_v2<S: Socket>(tftp_socket: &mut S) -> Result<(), S::SocketError> { ... }
In principle I would want to define the Socket
trait like this, except the syntax isn't valid:
trait<E> Socket: Read<Error = E> + Write<Error = E> {}
And trying to imitate it with type parameters on the trait (Socket<E>
) doesn't work -- the type checker doesn't seem to be able to recognize that all definitions of S::Error
have been unified.
trait Socket<E>: Read<Error = E> + Write<Error = E> { }
/*
error[E0221]: ambiguous associated type `Error` in bounds of `S`
| type Error;
| ---------- ambiguous `Error` from `Read`
| type Error;
| ---------- ambiguous `Error` from `Write`
*/
fn netboot<E, S: Socket<E>>(tftp_socket: &mut S) -> Result<(), S::Error> { ... }
Well composition is not impossible, it just becomes annoying. You could do something like this
trait Socket: Read + Write {
fn set_nodelay(&self, nodelay: bool) -> Result<(), <Self as Read>::Error>;
}
Or if it can return both errors, something like this
enum SocketError<R, E> {
ReadError(R),
WriteError(E),
}
trait Socket: Read + Write {
fn set_nodelay(&self, nodelay: bool) -> Result<(), SocketError>;
}
Other than this point there are a couple of things, I can think of
- We should probably also consider
std::io::Read
andstd::io::Write
being wrapper to these trait. This way, we don't have to duplicate the implementation and opens up the possibility of sharing code betweenno_std
andstd
environments - We also need to decide if we want to have default implementations for methods like
read_exact
,read_vectored
,write_exact
,write_vectored
and so on, likestd::io::Read
andstd::io::Write
currently do. If we want to have something like this then we have two special errors that need to be expressed as part of the above API- Read/Write Interrupted, just like in case of
std::io::Read
andstd::io::Write
- Ran out of memory. This may not always be an error, especially in case of this being an implementation for
std::io::Write
andstd::io::Read
, since they can easily extend the buffer
- Read/Write Interrupted, just like in case of
Right, I think that wanting to reuse the same error for Read
and Write
is reasonable but forcing them to have the same error doesn't really solve the problem, since people still can inject their own errors into the mix as well. Consider the common case of read, parse, then write; unless you have some standard io::Error
type, you have to make your own error for parsing, and it makes the most sense if you're using a custom Error
type on the trait to simply have a ternary enum like @adityashah1212 mentioned:
pub enum MyError<R, W> {
Read(R),
Parse(ParseError),
Write(W),
}
If we added (but didn't enforce) an io::Error
type we could do this, using associated type bounds for brevity:
fn read_parse_write<
R: Read<Error: Into<IoError>>,
W: Write<Error: Into<IoError>>,
>(
read: &mut R,
write: &mut W,
) -> Result<(), IoError>;
Well composition is not impossible, it just becomes annoying. You could do something like this
trait Socket: Read + Write { fn set_nodelay(&self, nodelay: bool) -> Result<(), <Self as Read>::Error>; }
This signature is incorrect, because in a design with separate error types for categories of I/O operations then Socket::set_nodelay
wouldn't return a read error -- it would return a SocketConfigError
or something like that.
Or if it can return both errors, something like this
I understand this works from the "it typechecks and compiles" perspective, but it's not an API you'd want to work with in real life. The low-level I/O traits don't have enough information to correctly assign semantics to specific error codes -- that requires some higher-level (usually OS-specific) context.
- We should probably also consider
std::io::Read
andstd::io::Write
being wrapper to these trait. This way, we don't have to duplicate the implementation and opens up the possibility of sharing code betweenno_std
andstd
environments
You'd need adapter types, since otherwise there's a lot of thorny questions about trait coherency. For example std::io::Write
has an impl for &mut Vec
, but it would also make sense to have impl core_io_traits::Write for &mut Vec { type Error = TryReserveError; ... }
-- a blanket implementation (in either direction) would prevent that.
On the plus side, wrapper structs CoreToStd<C>
and StdToCore<S>
are both straightforward to implement, and could live in std::io
.
- We also need to decide if we want to have default implementations for methods like
read_exact
,read_vectored
,write_exact
,write_vectored
and so on, likestd::io::Read
andstd::io::Write
currently do. If we want to have something like this then we have two special errors that need to be expressed as part of the above API
read_exact
and write_all
don't have default implementations in this design, because implementing them requires knowledge of the error code semantics.
The *_vectored
methods do have default implementations, because they can be implemented in terms of the other methods without inspecting error conditions:
-
write_vectored
: callwrite()
with the first iobuf, return the result. Still a single syscall (or interrupt, or whatever), and a partial write is permitted by the API. -
write_all_vectored
: callwrite_all()
on each iobuf until it returns an error. No need to inspect the error.
- Read/Write Interrupted, just like in case of
std::io::Read
andstd::io::Write
- Ran out of memory. This may not always be an error, especially in case of this being an implementation for
std::io::Write
andstd::io::Read
, since they can easily extend the buffer
As written above, I think the proposed traits would sit below the layer at which semantics can be assigned to error codes. You'd need the type itself to tell you what the semantics are.
For example, you could imagine something like this:
enum IoError {
Read(ReadError),
Write(WriteError),
Other(i32),
}
struct Serial { ... }
impl core_io_traits::IO for Serial { type Error = IoError; }
impl core_io_traits::Read for Serial { ... /* read() returns IoError::Read on error */ }
So the trait itself doesn't mandate the structure of the error codes, but the implementation (which is likely to be platform- and consumer-specific) would.
Right, I think that wanting to reuse the same error for
Read
andWrite
is reasonable but forcing them to have the same error doesn't really solve the problem, since people still can inject their own errors into the mix as well. Consider the common case of read, parse, then write; unless you have some standardio::Error
type, you have to make your own error for parsing, [...]
This feels like it's a design proposal for an ACP that moves std::io::Error
into core so that the implementations of Read
/ Write
can be generic. I'm not against that idea, exactly (it does seem like a huge amount of work), but it's separate from what I'm proposing here.
Currently, there are no traits in core
to represent bulk read/write to byte streams. The closest equivalents are Iterator
and Extend
-- you can do something like type Read<E> = Iterator<Item = Result<u8, E>>
, but the performance and generated assembly is bad. There's also core::fmt::Write
, but it's limited to UTF-8.
The code structure I want to enable is something like this:
// crate that implements a kernel
struct Serial;
struct IoError(i32);
impl core_io_traits::IO for Serial { type Error = IoError; }
impl core_io_traits::Read for Serial { ... }
impl core_io_traits::Write for Serial { ... }
// crate that implements SLIP
fn recv_packet<'a, R: core_io_traits::Read>(r: &mut R, buf: &'a mut [u8]) -> Result<Packet<'a>, R::Error>;
fn send_packet<W: core_io_traits::Write>(w: &mut W, packet: Packet<'_>) -> Result<(), W::Error>;
The SLIP code doesn't need to know anything about the structure of the IO errors it might encounter, it just needs to pass them through. There's a huge amount of library code that looks like this -- they need to be able to incrementally send/receive bytes, and can't do anything about errors except pass them along.
For what it's worth, @programmerjake did mention potentially moving io::Error
into core as an option: https://github.com/rust-lang/project-error-handling/issues/11#issuecomment-1761019324
But also, I think that some of the adjacent types, like io::ErrorKind
should definitely be in core, since they're generic and don't depend on any allocations.
Yes, I'm aware there's broad interest in moving std::io::Error
into core
-- note how I linked that exact issue in the first post.
I don't think this proposal blocks or is blocked by moving std::io::Error
, and even in a world where core::io::Error
exists I would not want the proposed Read
/ Write
traits to have a hard dependency on it.
You'd need adapter types, since otherwise there's a lot of thorny questions about trait coherency. For example std::io::Write has an impl for &mut Vec , but it would also make sense to have impl core_io_traits::Write for &mut Vec { type Error = TryReserveError; ... } -- a blanket implementation (in either direction) would prevent that.
@jmillikin, I haven't really looked in to deep into this, so I might be missing stuff, but what I was proposing was to have core::io::Write
as the base implementation and have a blanket implementation of std::io::Write
on top of core::io::Write
. So as long as core::io::Write
is implemented for &mut Vec
, std::io::Write
implementation is also present. This will pose one problem though, we won't be able to both core::io::Write
and std::io::Write
implemented for the same type, which may be required in some cases for optimized implementations (?). Though, I think based on what you are saying, this maybe a bit out of scope for this one.
read_exact and write_all don't have default implementations in this design, because implementing them requires knowledge of the error code semantics.
The *_vectored methods do have default implementations, because they can be implemented in terms of the other methods without inspecting error conditions:
write_vectored: call write() with the first iobuf, return the result. Still a single syscall (or interrupt, or whatever), and a partial write is permitted by the API. write_all_vectored: call write_all() on each iobuf until it returns an error. No need to inspect the error.
I understand. If this only wants to be a bare bones trait agnostic of any failures, it should be fine. Though I think, it will far from enough for most use cases and the users will still have to have super traits on top of these for them to be usable.
Though I am still with @clarfonthey on not having the IO
trait. In fact, since you say this is bare bones and only implements the most generic parts of Read
and Write
, it further strengths his argument. Since we are not going to deal with errors on our end, we might as well completely leave it to users. In case of a device doing both Read
and Write
, users can choose to have different errors for each and then have a composite enum or choose to have same error both traits and then handle them according to their needs
@jmillikin, I haven't really looked in to deep into this, so I might be missing stuff, but what I was proposing was to have
core::io::Write
as the base implementation and have a blanket implementation ofstd::io::Write
on top ofcore::io::Write
. [...]
I understand what you're asking for, the problem is trait coherence rules and backwards compatibility.
A plausible implementation of core_io_traits::Write
for Vec<u8>
would be:
impl core_io_traits::Write for Vec<u8> {
type Error = crate::collections::TryReserveError;
fn write(&mut self, buf: &[u8]) -> Result<usize, Self::Error> {
self.try_reserve(buf.len())?;
self.extend_from_slice(buf);
Ok(buf.len())
}
}
The blanket implementation of core_io_traits::Write
-> std::io::Write
would have to look something like this:
impl<T, E> std::io::Write for T
where
T: core_io_traits::Write<Error = E>,
E: Into<std::io::Error>,
{
fn write(&mut self, buf: &[u8]) -> Result<usize, std::io::Error> {
core_io_traits::Write::write(self, buf).map_err(Into::into)
}
}
And now you're stuck, because the existing implementation of impl std::io::Write for Vec<u8>
panics if allocation fails. There's no way to maintain the current behavior and impl core_io_traits::Write for Vec<u8>
and have a blanket impl of std::io::Write
.
Thus, adapter types are more practical:
struct CoreToStd<C>(C);
impl<E, C> std::io::Read for CoreToStd<C>
where
C: core_io_traits::Read<Error = E>,
E: Into<std::io::Error>,
{
fn read(&mut self, buf: &mut [u8]) -> Result<usize, std::io::Error> {
core_io_traits::Read::read(&mut self.0, buf).map_err(Into::into)
}
}
impl<E, C> std::io::Write for CoreToStd<C>
where
C: core_io_traits::Write<Error = E>,
E: Into<std::io::Error>,
{
fn write(&mut self, buf: &[u8]) -> Result<usize, std::io::Error> {
core_io_traits::Write::write(&mut self.0, buf).map_err(Into::into)
}
fn flush(&mut self) -> Result<(), std::io::Error> {
core_io_traits::Write::flush(&mut self.0).map_err(Into::into)
}
}
I understand. If this only wants to be a bare bones trait agnostic of any failures, it should be fine. Though I think, it will far from enough for most use cases and the users will still have to have super traits on top of these for them to be usable.
Well, that's always the drawback with primitive traits (type-classes) in a nominal type system.
The purpose of traits like core_io_traits::{Read,Write}
is to name a specific functionality, so that third-party code can refer to that common name. Making those traits very generic means they can be broadly useful by providing a name to that functionality in various contexts ("bytes can be written to a Vec
, a File
, a SerialPort
, a TwoCupsAndWetString
, ..."). Then a no_std
library that just needs to read/write bytes to something can be decoupled from the underlying details -- it's really useful for a compression library (for example) to be able to stream data, but it doesn't need to know what it's streaming to/from.
However, the more generic you make a trait, the less information they provide. A Linux network socket, a WASM linear memory page, and a microcontroller's infrared LED have very little in common, so a Write
trait that can be useful to all of them can't really prescribe any higher-level semantics. It's just strictly "write bytes, get back opaque error".
This is different from the std::io
traits, which come bundled with a huge amount of implicit context just by being part of std
-- implementors can assume access to malloc, a specific target OS, semantics around EINTR
, and so on.
Though I am still with @clarfonthey on not having the
IO
trait. In fact, since you say this is bare bones and only implements the most generic parts ofRead
andWrite
, it further strengths his argument. Since we are not going to deal with errors on our end, we might as well completely leave it to users. In case of a device doing bothRead
andWrite
, users can choose to have different errors for each and then have a composite enum or choose to have same error both traits and then handle them according to their needs
As mentioned, the problem is composition in libraries. It should be possible for a library to describe necessary I/O functionality with Read + Write
or Read + Write + self::SocketOps
or whatever, which is somewhere between impractical and impossible if the error types of each I/O trait are separate.
// it's really nice to not need a three-line `where` for each function that needs to
// read and write to the same IO thingie.
fn handshake<S: Read + Write>(socket: &mut S) -> Result<(), S::Error> { ... }
For cases when there are two separate I/O trait hierarchies involved (such as streaming compression), the concrete types are also necessarily separate, so the IO
trait doesn't interfere:
fn copy<R: Read, W: Write>(r: &mut R, w: &mut W) -> Result<(), CopyError<R::Error, W::Error>> { ... }
enum CopyError<RE, WE> {
ReadError(RE),
WriteError(WE),
}