crystal
crystal copied to clipboard
Split `IO` into mixins indicating their capabilities like `IO::Reader`, `IO::Writer`, `IO::Peekable`, and `IO::Seekable`
Discussion
What aspect of the language would you like to see improved?
The IO abstract class is great in a lot of ways, giving us a common interface for dealing with I/O. However, objects inheriting from it are frequently unidirectional, only allowing reading or writing, but not both, and require you to raise an exception when you do the one you're not supposed to do.
What are the reasons?
The current implementation allows you to do things that don't make any sense. For example, I can wrap a write-only IO in a read-only wrapper. It won't work, but it can currently only be enforced at runtime.
With more precise use of the type system, we can split these up so that this sort of weirdness is detected at compile time.
There are mixins like IO::Evented and IO::Buffered that declare additional details of your IO objects. They provide some default behavior, but even IO::Buffered requires you to handle both directions regardless of whether you're actually implementing both.
I also wish that some IO wrappers could be structs to avoid heap allocations when they don't accrue their own state and instead just transform or cap data passing through them, but they would have to be classes anyway with my proposed solution below for the same reason that HTTP::Handlers must be classes.
Include practical examples to illustrate your points.
A few examples in the stdlib:
IO::SizedraisesIO::Errorwhen you try to write to itIO::MultiWriterraisesIO::Errorwhen you try to read from itHTTP::Server::ResponseraisesExceptionwhen you try to read from it
I recognize that in the early days it made a lot of sense to hammer this out and just copy what Ruby did, but I feel like this makes the capabilities more explicit.
Optionally add one (or more) proposals to improve the current situation.
This is kinda what I'm thinking of:
abstract class IO
module Reader
@decoder : Decoder?
abstract def read(bytes : Bytes) : Int32
end
module Writer
@encoder : Encoder?
abstract def write(bytes : Bytes) : Nil
end
include Reader
include Writer
end
This way, anything that inherits from IO (files, sockets, etc) continues to work with bidirectional I/O and doesn't need to be changed. Anything that accepts an IO::Writer can accept a bidirectional IO. It just needs to be writable.
Then should IO::Evented and IO::Buffered themselves include Reader or Writer? Without implementing some type system feature that lets you express "IO::Buffered(T) <= IO::Writer if and only if T <= IO::Writer", you'd still end up with the same runtime raises somewhere.
Also I assume you don't mean that IO itself should include those mixins. Otherwise everything inheriting from it, including e.g. IO::Sized and IO::MultiWriter, would implicitly be bidirectional if the type hierarchy is used to determine directionality.
Then should
IO::EventedandIO::Bufferedthemselves includeReaderorWriter?
I think so. I haven’t been able to come up with any alternatives. And similar to IO including both for bidirectional I/O, maybe IO::Buffered includes both IO::Buffered::Reader and IO::Buffered::Writer mixins.
And now that I think about it, file I/O is rarely done bidirectionally on the same object, so maybe even File::Reader and File::Writer. I’m not sure.
I’m suddenly thinking back to all the Java I/O classes like java.io.BufferedReader, java.io.FileWriter, etc. I guess there really was a good reason for all those classes. 😄 Well, most of them. At least Crystal doesn’t need different types for Bytes vs Strings.
Also I assume you don't mean that
IOitself should include those mixins. Otherwise everything inheriting from it, including e.g.IO::SizedandIO::MultiWriter, would implicitly be bidirectional if the type hierarchy is used to determine directionality.
That is indeed what I meant. In this scenario, rather than inheriting from IO, IO::Sized would include IO::Reader and IO::MultiWriter would include IO::Writer. They would inherit only the functionality they need.
I recognize that this is probably a breaking change and wouldn’t happen until 2.0. However, IO including both directional mixins (and IO::Buffered including its own directional ones) would keep the actual breakage to a minimum. I have a feeling that most objects that deal with IO subclasses like IO::Sized or IO::MultiWriter are more likely to depend on the concrete class themselves rather than IO. Not all, but most. Most dependencies I’ve seen on IO are on actual bidirectional I/O objects.
I feel like the hardest part would be updating the stdlib for all the file I/O (if it’s decided that that should be split out, of course), especially in the lower-level and/or platform-specific stuff in the Crystal namespace.
We could split IO's functionality across multiple modules simply as a matter of separation of concerns. What I don't understand is how it improves the "raise if IO is used in the opposite direction" situation.
There won't be an abstract def write if you're defining a read-only object, so there's no need to raise in it.
class IO::Sized
include IO::Reader
# ...
def read(bytes : Bytes)
# ...
end
end
I've had this idea in the back of my head for a while.
Virtually the only streams that are used bidirectionally are sockets. Everything else is usually uni-directional. If you look at the descendants of IO, almost all of them have either read or write not implemented. So it makes a lot of sense to reflect this in the type system. If something cannot possibly used for reading, there shouldn't be a method for trying it in the first place.
Directionality issues seem to be not much of a concern in practice, though. I suppose it's usually pretty clear in what direction an IO is used. So I wouldn't expect a tighter type harness to bring much in terms of avoiding bugs. That makes this topic less of a pressing issue.
However, if nothing else, it would be a great improvement for documentation though. For a method def foo(io : IO) it's unclear whether the parameter is used for reading or writing. Usually the context should tell, but it would be super nice to have it explicitly encoded in the type: def foo(io : IO::Reader).
I also wish that some
IOwrappers could be structs to avoid heap allocations when they don't accrue their own state and instead just transform or cap data passing through them
This could only work if they have really no state. Even data capping requires accounting of the volume that passes through.
I've wondered about enabling structs for IO before, but I'm not sure it would be that much useful. The only use case would be IOs that are purely transforming passed through data, such as IO::Hexdump and JSON::Builder::Escape. They mustn't even have their own closed state. And naturally these simple IOs are also very small, so allocation is less effort than for a big type with a static buffer, for example.
Another angle on this topic could be stack allocation of class objects. With the experimental ReferenceStorage and pre_initialize features, this becomes quite easy. Especially throw-away IOs such as String::Builder could benefit from this. We can easily use stack allocation in yielding .build methods. This should be trivial and pretty safe because these IO's are not meant to be used after the block has ended anyway.
Directionality issues seem to be not much of a concern in practice, though.
I definitely agree with this. It's not so much "someone could accidentally do the wrong thing", but more "it doesn't make sense to even allow this, let alone force me to define what to do with it".
The only use case would be IOs that are purely transforming passed through data, such as
IO::HexdumpandJSON::Builder::Escape. They mustn't even have their ownclosedstate.
These are the main ones that I write. 😄 That's where this thought came from. But also it's unlikely to make a noticeable difference in most cases anyway due to the fact that you're likely doing actual I/O, which probably consumes more CPU time than the overhead of a single heap-allocated object's lifecycle. I'd have to benchmark to be sure, though.
Another angle on this topic could be stack allocation of class objects. With the experimental
ReferenceStorageandpre_initializefeatures, this becomes quite easy.
I wasn't following that discussion before because I didn't realize that's what it was about, but now I'm definitely paying attention. 😄