Deprecate error-prone readAll() style APIs and replace with Reader.streamUntilEof()
The readAll() functions seem to cause significant confusion and doubtless lead to people writing bugs if they are not aware that of the silent truncation behavior if the provided buffer is not large enough to hold the entire file.
One attempt to fix this was: https://github.com/ziglang/zig/pull/20017, but I do not think this is the correct approach to take design-wise. (See https://github.com/ziglang/zig/pull/20017#pullrequestreview-2584138087)
The path I personally think makes the most sense here would be to remove existing readAll() style functions and replace them with a single std.io.Reader.streamUntilEof() similar to the very nice and flexible streamUntilDelimiter() API that was proposed in https://github.com/ziglang/zig/issues/15528 and implemented in https://github.com/ziglang/zig/pull/15762.
Similar to the examples given in that proposal, streamUntilEof() in combination with an std.BoundedArray() would be able to completely replace readAll().
The cases that are a bit trickier to handle with streamUntilEof() include std.fs.File.preadAll(), readvAll(), and preadvAll().
I think that preadAll() could be replaced with a std.fs.File.preader(offset) function that returns a reader which starts at the given offset and uses pread syscalls. This could then be used with the general Reader.streamUntilEof() function proposed above.
For File.readvAll() and File.preadvAll() it wouldn't make any sense to try and expose an equivalent Reader API. Instead, I think a simple rename plus better documentation would suffice. I propose File.readvFillBuffers() and File.preadvFillBuffers() for new names that don't imply the entire file will be read if the file size is larger than the buffer size.
Look at that, someone happens to have seemingly independently submitted a PR implementing streamUntilEof() just now: https://github.com/ziglang/zig/pull/22681
Thanks for picking up on this issue.
I agree that readAll and friends are misleadingly named, but there are several reasons I disagree with their removal.
-
streamUntilEofis less convenient to use thanreadAll. People WILL usereadand not handle short reads, which is a common footgun. stdlib APIs should be hard to misuse, and this change is the opposite of that. See also: https://github.com/DarkPlacesEngine/DarkPlaces/pull/88 -
streamUntilEofnecessarily introduces an extra copy, which is useful forstreamUntilDelimiter, but only hurts performance here. -
streamUntilEofintroduces an extra writer. LLVM is often unable to devirtualize and inline well enough to hide the overhead of using a function pointer. See also: #22322 for performance problems caused byReaderandWriterand #17985 for performance problems caused bystreamUntilDelimiter.
Removing readAll and friends outright would make Zig code less safe by making it harder to deal with short reads, and the fact that streamUntilEof is necessarily slower than readAll means people will end up reimplementing it themselves.
How about readAll is renamed to something like fill or fillSlice? That way, it's not confused with reading the entirety of a stream.
streamUntilEof basically already exists -- it's called LinearFifo(u8, ...).pump:
pub fn main() !void {
// send stdin to stdout
const stdin = std.io.getStdIn();
const stdout = std.io.getStdOut();
var fifo: std.fifo.LinearFifo(u8, .{ .Static = 1024 }) = .init();
try fifo.pump(stdin, stdout);
}
const std = @import("std");
The LinearFifo acts as the temporary buffer; we read into the FIFO buffer, then write that data back out.
I'm strongly against the outright removal of readAll; it's a quite common operation, and the alternatives are quite clunky. I'd be okay with wrapping the above (or something similar, the impl is pretty simple) into a convenience function on Reader.
@notcancername you make some good points. Perhaps a simple rename + better docs is indeed the best option here. I like readFillSlice a bit more than the plain fillSlice myself.
Perhaps it would also be good to rename the higher level read() functions for File and Reader to readPartial() to further disambiguate the two (the low level std.posix.read() should probably continue to match the name of the syscall).
I'm strongly against the outright removal of readAll; it's a quite common operation, and the alternatives are quite clunky.
From grepping around the zig source tree, it doesn't seem to be quite as common as I expected. Furthermore, many of the uses are immediately followed by an if statement checking for truncation, which is also clunky. The use cases that do not have such a check could probably use a careful audit.
I don't think readAll should be changed to readPartial. That will probably lead people to use read instead. Technically, the read() function is the one that's partial. readAll() writes all that could be read to the buffer. The documentation pretty clearly states that "If the number read is smaller than buffer.len, it means the stream reached the end."
I landed here looking for the documentation of readAll to see how it works. (Ultimately this issue might just be a documentation one). A function that is "read the number of bytes I know in advance should be there or return an error" is what I am looking for. It seems like that is what readAll is designed for. Perhaps a name change is definitely in order.
Just brainstorming, perhaps readSlice is another option.
A function that is "read the number of bytes I know in advance should be there or return an error" is what I am looking for.
This is provided by readNoEof. Plain read can read any number of bytes, and readAll will return the number of bytes read on EOF.
I believe this was addressed by #24329