zig icon indicating copy to clipboard operation
zig copied to clipboard

Deprecate error-prone readAll() style APIs and replace with Reader.streamUntilEof()

Open ifreund opened this issue 11 months ago • 7 comments

The readAll() functions seem to cause significant confusion and doubtless lead to people writing bugs if they are not aware that of the silent truncation behavior if the provided buffer is not large enough to hold the entire file.

One attempt to fix this was: https://github.com/ziglang/zig/pull/20017, but I do not think this is the correct approach to take design-wise. (See https://github.com/ziglang/zig/pull/20017#pullrequestreview-2584138087)

The path I personally think makes the most sense here would be to remove existing readAll() style functions and replace them with a single std.io.Reader.streamUntilEof() similar to the very nice and flexible streamUntilDelimiter() API that was proposed in https://github.com/ziglang/zig/issues/15528 and implemented in https://github.com/ziglang/zig/pull/15762.

Similar to the examples given in that proposal, streamUntilEof() in combination with an std.BoundedArray() would be able to completely replace readAll().

The cases that are a bit trickier to handle with streamUntilEof() include std.fs.File.preadAll(), readvAll(), and preadvAll().

I think that preadAll() could be replaced with a std.fs.File.preader(offset) function that returns a reader which starts at the given offset and uses pread syscalls. This could then be used with the general Reader.streamUntilEof() function proposed above.

For File.readvAll() and File.preadvAll() it wouldn't make any sense to try and expose an equivalent Reader API. Instead, I think a simple rename plus better documentation would suffice. I propose File.readvFillBuffers() and File.preadvFillBuffers() for new names that don't imply the entire file will be read if the file size is larger than the buffer size.

ifreund avatar Jan 30 '25 15:01 ifreund

Look at that, someone happens to have seemingly independently submitted a PR implementing streamUntilEof() just now: https://github.com/ziglang/zig/pull/22681

ifreund avatar Jan 30 '25 17:01 ifreund

Thanks for picking up on this issue.

I agree that readAll and friends are misleadingly named, but there are several reasons I disagree with their removal.

  • streamUntilEof is less convenient to use than readAll. People WILL use read and not handle short reads, which is a common footgun. stdlib APIs should be hard to misuse, and this change is the opposite of that. See also: https://github.com/DarkPlacesEngine/DarkPlaces/pull/88
  • streamUntilEof necessarily introduces an extra copy, which is useful for streamUntilDelimiter, but only hurts performance here.
  • streamUntilEof introduces an extra writer. LLVM is often unable to devirtualize and inline well enough to hide the overhead of using a function pointer. See also: #22322 for performance problems caused by Reader and Writer and #17985 for performance problems caused by streamUntilDelimiter.

Removing readAll and friends outright would make Zig code less safe by making it harder to deal with short reads, and the fact that streamUntilEof is necessarily slower than readAll means people will end up reimplementing it themselves.

How about readAll is renamed to something like fill or fillSlice? That way, it's not confused with reading the entirety of a stream.

notcancername avatar Jan 30 '25 19:01 notcancername

streamUntilEof basically already exists -- it's called LinearFifo(u8, ...).pump:

pub fn main() !void {
    // send stdin to stdout

    const stdin = std.io.getStdIn();
    const stdout = std.io.getStdOut();

    var fifo: std.fifo.LinearFifo(u8, .{ .Static = 1024 }) = .init();
    try fifo.pump(stdin, stdout);
}
const std = @import("std");

The LinearFifo acts as the temporary buffer; we read into the FIFO buffer, then write that data back out.

I'm strongly against the outright removal of readAll; it's a quite common operation, and the alternatives are quite clunky. I'd be okay with wrapping the above (or something similar, the impl is pretty simple) into a convenience function on Reader.

mlugg avatar Jan 30 '25 22:01 mlugg

@notcancername you make some good points. Perhaps a simple rename + better docs is indeed the best option here. I like readFillSlice a bit more than the plain fillSlice myself.

Perhaps it would also be good to rename the higher level read() functions for File and Reader to readPartial() to further disambiguate the two (the low level std.posix.read() should probably continue to match the name of the syscall).

I'm strongly against the outright removal of readAll; it's a quite common operation, and the alternatives are quite clunky.

From grepping around the zig source tree, it doesn't seem to be quite as common as I expected. Furthermore, many of the uses are immediately followed by an if statement checking for truncation, which is also clunky. The use cases that do not have such a check could probably use a careful audit.

ifreund avatar Jan 30 '25 23:01 ifreund

I don't think readAll should be changed to readPartial. That will probably lead people to use read instead. Technically, the read() function is the one that's partial. readAll() writes all that could be read to the buffer. The documentation pretty clearly states that "If the number read is smaller than buffer.len, it means the stream reached the end."

amarz45 avatar Feb 21 '25 21:02 amarz45

I landed here looking for the documentation of readAll to see how it works. (Ultimately this issue might just be a documentation one). A function that is "read the number of bytes I know in advance should be there or return an error" is what I am looking for. It seems like that is what readAll is designed for. Perhaps a name change is definitely in order.

Just brainstorming, perhaps readSlice is another option.

loftafi avatar May 14 '25 03:05 loftafi

A function that is "read the number of bytes I know in advance should be there or return an error" is what I am looking for.

This is provided by readNoEof. Plain read can read any number of bytes, and readAll will return the number of bytes read on EOF.

notcancername avatar May 17 '25 17:05 notcancername

I believe this was addressed by #24329

andrewrk avatar Aug 05 '25 21:08 andrewrk