io: Scatter/gather / zero-copy APIs
Checking with the Ariel OS team on whether there's anything amiss in embedded-io, the request for vectored I/O came up – revolving around "Are those APIs suitable for avoiding needless intermediate copies?".
There is some vagueness in what this entails; some aspects that seem important are:
- Doing the equivalent of
std::io::Write::write_vectoredis almost trivial because we write to &mut self (so there's no danger of interleaving), and barring optimizations of using a single syscall, awrite_vectoredcan be provided by calling single writes. - Same goes for a simple
read_vectoredthat merely writes to multiple caller-allocated buffers. - A more powerful form of a vectored read (call it zero-copy read) would produce an iterator view of the data as it sits eg. in some TCP buffer; that data may be non-contiguous eg. when straddling a ring buffer's wraparound. That would not (as the std::io form) populate the moral equivalent of
&mut [&mut [u8]], but would return roughly animpl Iterator<Item = &[u8]>. - There might be a powerful form of the Write as well, where some longer-lived type (something something
AsRef<[u8]>) is sent to the Writer, which it may then consume at the rate of the device writing, but there's some uncertainty about whether that is relevant.
The pressing first point, in my current context of https://github.com/rust-embedded/embedded-hal/issues/566 (embedded-io 1.0) is: Would any of that need breaking changes if we released the current main as 1.0?
I think not, because:
- For the simple forms, a
read_vectored/write_vectoredcould be added to the trait at a later time as provided methods. - For the zero-copy read, a user implementing an application on a generic embedded_io::Read implementation could not know whether vectored reads are available -- for all they know, this would most likely come from a byte-oriented device that has no buffers of their own. (From that PoV, those who use embedded-io as the interface to a TCP connection are the weird ones out because their reads are actually buffered internally in the network stack). Nonetheless, an interface could be added as a provided method: The user would provide a fallback buffer to read into, which is what the provided method would populate through a regular read call. This may look wasteful in light of use with zero-copy-capable backends (after all, the user may not be motivated by the CPU cycles copying takes, but because their stack is limited), but an example implementation at https://godbolt.org/z/heEs3jG64 shows that through monomorphization, if the concrete Read impl's method never produces the "I wrote to your buffer" outcome, that allocation is dead code, and gets eliminated at build time.
- I expect that any shenanigans with powerful Write would work with provided methods that just instantly block on writing the data; at worst, they'd need an associated type, and would thus depend on the stabilization of https://github.com/rust-lang/rust/issues/29661.
So my questions to the community / maintainers are:
- Is this something you'd like to see considered / added / would review PRs for? Note that while this is primarily about embedded-io now, it may also help us build the equivalent tools for embedded-nal's UDP side. (Its TCP side is what'd profit from these methods already).
- Do you agree that anything that'd be added could be added after an 1.0 release, as argued above?
CC'ing @kaspar030 with whom I've discussed this for the Ariel use cases.
What are our goals?
- Doing DMA straight to/from user's buffers? The current Read, Write traits can already do that (with some caveats[^1], but write_vectored/read_vectored would also suffer from those)
- Doing scatter/gather DMA straight to/from user's buffers? For this we do need write_vectored/read_vectored. How common is it though? Scatter/gather DMA hardware is somewhat rare especially with the smaller MCUs.
- Skipping a copy between the user's buffer and the reader/writer internal buffer? We already have BufRead which allows accessing the data straight from the internal buffer. We could have BufWrite as well. They're a bit hard to use though.
- Save system call overhead? This is the reason writev/readv exist on Linux, it allows doing 1 system call instead of N because system calls on desktop OSs are slow. This effect is not present on embedded (unless you're using an RTOS with userspace/kernelspace split which is rare)
- Something else...?
[^1]: the caveats are: 1. for uart read you need to use rts/cts to control when bytes come in. 2. for async you need to be okay with mem::forgetting the future to be unsound, but that's already the case for the embedded-hal async traits.
Thanks for the quick responses.
The use cases for this are not about DMA (that can be done with the traits well enough), nor syscall overhead (with modern RTOSs I think the overhead is not so big that it warrants the complexity), but about streams that are already buffered into memory mapped locations by the peripheral or the drivers, and about not using up stack size for something that could be a slice -- mainly in network peripherals (on different layers: radio drivers deposit read frames somewhere, and likewise TCP stacks that provide embedded-io on reassembled TCP streams).
I wasn't aware of BufRead, and we'll look through whether this (and possibly a BufWriter) will suffice for those use cases – and either update this with more details, or close.
about BufWrite: the original goal of embedded-io was just "std::io but usable on no-std". Adding stuff to embedded-io that doesn't exist in std::io has some downsides. It's hard/impossible to write adapters to something that doesn't exist. Also, if the feature lands in std::io later with a different design, embedded-io would be inconsistent with it.
So at the very least I'd like to understand why std::io doesn't have BufWrite, before we add it.
Also we can always add it post-1.0, it's backwards compatible since it's a new independent trait.