TranscodingStreams.jl icon indicating copy to clipboard operation
TranscodingStreams.jl copied to clipboard

Why implement `Buffer` and `Memory`?

Open warrenalphonso opened this issue 2 years ago • 2 comments

Hey I've just been poking around this package and it seems overly complex. Can someone more familiar with it make sure I haven't just missed Chesterton's fence 😛 Happy to work on this if people agree it makes sense

AFAICT, TranscodingStream manages an input and output Buffer; data is read from the input stream into the input Buffer. This is passed to a codec via Memory (which is just an abstraction over the "used" and "free" parts of a Buffer). The codec writes to the output Buffer, and that's given to the user via an output stream when requested.

  • Remove Memory. I can't figure out why this is useful, aside from just making things slightly easier for downstream codecs that are C-based and need pointers. If that's the case, we could views instead.
  • Remove Buffer. We really don't need our own implementation; I think IOBuffer should work. Fundamentally, we just want to load data from the input stream into some input array, let a codec convert part of that to an output array, and send it to the user via a stream.

Less importantly:

  • Remove Error. Why not just try/catch instead of passing yet another variable around as state?

warrenalphonso avatar Oct 10 '22 20:10 warrenalphonso

Actually, IOBuffer doesn't work so well because it uses the same pointer when reading/writing. What we want is a buffered stream.

Maybe it makes sense to use BufferedStreams.jl so TranscodingStreams.jl doesn't need to ship its own fast buffer implementation? One problem is that BufferedStreams.jl requires choosing between BufferedInputStream and BufferedOutputStream structs. This doesn't seem compatible with the current TranscodingStream design which uses the given stream as either input or output depending on whether it's in :read or :write mode.

I think it's better design to have two separate structs for input and output transcoding, instead of keeping this logic inside some state component. It would also make the logic simpler since we don't have to worry about writing while in read state or vice versa.

Any thoughts? Maybe @jakobnissen ?

warrenalphonso avatar Oct 13 '22 22:10 warrenalphonso

It seems complex to me too, but my experience is that usually stuff is complicated because there are edge cases I havne't thought of. Memory is a way to abstract over input memory, which may not be in the form of a Vector{UInt8}, i.e. it may be a SubString{String}. It may be possible to instead make it a parameterized Julia type instead of storing the raw pointer.

I don't think we can remove Buffer. We need low-level control of the implementation, and so would have to rely on implementation details of IOBuffer. We also need more features like Buffer.marked, which IOBuffer does not provide.

jakobnissen avatar Jan 09 '23 13:01 jakobnissen