Should `write` with an `END_TOKEN` call `finalize` on the stream to prevent memory leaks?
The desire here is to close the TranscodingStream without closing the underlying buffer. This is documented in https://juliaio.github.io/TranscodingStreams.jl/latest/examples/#Explicitly-finish-transcoding-by-writing-TOKEN_END-1 and says that you should write a TOKEN_END token to the stream. However, an issue with that is that it only flushes the stream but it doesn't finalize it which leads to memory leaks in code written like:
using CodecZlib
using TranscodingStreams
function leak()
buf = IOBuffer()
data = rand(10^6)
while true
zWriter = ZlibCompressorStream(buf)
write(zWriter, data)
write(zWriter, TranscodingStreams.TOKEN_END)
flush(zWriter)
end
end
leak()
which will indefinitely leak. Manually calling finalize on the zWriter fixes the issue but it is not clear from the documentation that this is required. There are a few possible solutions:
- Attach a finalizer to the stream that calls
finalize. This is not ideal because you want a more eager cleanup than whenever the GC gets to it. - Make writing a
TOKEN_ENDcallfinalizeon the stream. - Make writing a
TOKEN_ENDset the stream mode to:closedand thereby allowingcloseon the wrapper stream to not close the underlying wrapped stream : https://github.com/JuliaIO/TranscodingStreams.jl/blob/2fac97171c2ff7b6828f49e34c22eb929cb672a1/src/stream.jl#L174-L183
Alternatively, it is also possible that the code that shows the leak above is "faulty" but generally, normal Julia code shouldn't leak like this so at least a finalizer might be a good idea.
The stream is expected to still be writable after writing TOKEN_END. For example, https://github.com/BioJulia/FASTX.jl/blob/v2.1.4/src/fastq/writer.jl#L53 uses TOKEN_END in a flush function.
With #178 you can do:
using CodecZlib
using TranscodingStreams
function no_leak()
buf = IOBuffer()
data = rand(10^6)
while true
zWriter = ZlibCompressorStream(seekstart(buf); stop_on_end=true)
write(zWriter, data)
close(zWriter)
end
end
no_leak()
Adding a finalizer is still a good idea.