TranscodingStreams.jl icon indicating copy to clipboard operation
TranscodingStreams.jl copied to clipboard

Should `write` with an `END_TOKEN` call `finalize` on the stream to prevent memory leaks?

Open KristofferC opened this issue 3 years ago • 2 comments

The desire here is to close the TranscodingStream without closing the underlying buffer. This is documented in https://juliaio.github.io/TranscodingStreams.jl/latest/examples/#Explicitly-finish-transcoding-by-writing-TOKEN_END-1 and says that you should write a TOKEN_END token to the stream. However, an issue with that is that it only flushes the stream but it doesn't finalize it which leads to memory leaks in code written like:


using CodecZlib
using TranscodingStreams

function leak()
    buf = IOBuffer()
    data = rand(10^6)
    while true
        zWriter = ZlibCompressorStream(buf)
        write(zWriter, data)
        write(zWriter, TranscodingStreams.TOKEN_END)
        flush(zWriter)
    end
end

leak()

which will indefinitely leak. Manually calling finalize on the zWriter fixes the issue but it is not clear from the documentation that this is required. There are a few possible solutions:

  • Attach a finalizer to the stream that calls finalize. This is not ideal because you want a more eager cleanup than whenever the GC gets to it.
  • Make writing a TOKEN_END call finalize on the stream.
  • Make writing a TOKEN_END set the stream mode to :closed and thereby allowing close on the wrapper stream to not close the underlying wrapped stream : https://github.com/JuliaIO/TranscodingStreams.jl/blob/2fac97171c2ff7b6828f49e34c22eb929cb672a1/src/stream.jl#L174-L183

Alternatively, it is also possible that the code that shows the leak above is "faulty" but generally, normal Julia code shouldn't leak like this so at least a finalizer might be a good idea.

KristofferC avatar Feb 10 '22 12:02 KristofferC

The stream is expected to still be writable after writing TOKEN_END. For example, https://github.com/BioJulia/FASTX.jl/blob/v2.1.4/src/fastq/writer.jl#L53 uses TOKEN_END in a flush function.

With #178 you can do:

using CodecZlib
using TranscodingStreams

function no_leak()
    buf = IOBuffer()
    data = rand(10^6)
    while true
        zWriter = ZlibCompressorStream(seekstart(buf); stop_on_end=true)
        write(zWriter, data)
        close(zWriter)
    end
end

no_leak()

Adding a finalizer is still a good idea.

nhz2 avatar Mar 16 '24 23:03 nhz2