CodecZstd.jl icon indicating copy to clipboard operation
CodecZstd.jl copied to clipboard

Reusing a compressor

Open JonasIsensee opened this issue 1 year ago • 6 comments

This was found here: https://github.com/JuliaIO/JLD2.jl/issues/599

When using a ZstdFrameCompressor or a ZstdCompressor, you reproducably get a segfault when trying to reuse the same compressor instance:

I don't know if this can be made to just work, but I hope that this can at least be caught in time and throw an error instead.

using CodecZstd, TranscodingStreams

compressor = ZstdFrameCompressor()
x = rand(UInt8, 4*10^7);
TranscodingStreams.initialize(compressor)
ret1 = transcode(compressor, x);
TranscodingStreams.finalize(compressor)

# compress again using the same compressor
TranscodingStreams.initialize(compressor) # segfault happens here!
# ret2 = transcode(compressor, x);
# TranscodingStreams.finalize(compressor)
[1706176] signal (11.1): Segmentation fault
in expression starting at REPL[7]:1
ZSTD_CCtx_reset at /home/jisensee/.julia/artifacts/4c45bf9c8292490acd9463bbfbf168277d9720b6/lib/libzstd.so (unknown line)
ZSTD_initCStream at /home/jisensee/.julia/artifacts/4c45bf9c8292490acd9463bbfbf168277d9720b6/lib/libzstd.so (unknown line)
ZSTD_initCStream at /home/jisensee/.julia/packages/CodecZstd/KmRP9/src/LibZstd_clang.jl:277 [inlined]
initialize! at /home/jisensee/.julia/packages/CodecZstd/KmRP9/src/libzstd.jl:60 [inlined]
initialize at /home/jisensee/.julia/packages/CodecZstd/KmRP9/src/compression.jl:82

JonasIsensee avatar Sep 15 '24 06:09 JonasIsensee

Maybe the initialize and finalize methods in this package can be removed, and instead initialize on construction and finalize using a finalizer?

nhz2 avatar Sep 15 '24 13:09 nhz2

@mkitti It looks like the finalize method cannot be made into a noop, and any use of a codec after calling finalize can lead to seg faults, so JLD2 will need to move those initialize and finalize calls.

#71 adds a finalizer to prevent memory leaks even if finalize is never called after compressing.

nhz2 avatar Sep 15 '24 21:09 nhz2

No Julia call should ever segfault. Could we detect the condition and throw an error before segfaulting?

I'm also thinking we should completely redo the JLD2 compression mechanjsm...

mkitti avatar Sep 15 '24 22:09 mkitti

Yes, that requires adding checks for null pointers everywhere the codec is used.

nhz2 avatar Sep 15 '24 23:09 nhz2

With #72 I get

julia> using CodecZstd, TranscodingStreams

julia> compressor = ZstdFrameCompressor()
ZstdFrameCompressor(level=3)

julia> x = rand(UInt8, 4*10^7);

julia> TranscodingStreams.initialize(compressor)

julia> ret1 = transcode(compressor, x);

julia> TranscodingStreams.finalize(compressor)

julia> TranscodingStreams.initialize(compressor)
ERROR: codec use after free
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] initialize(codec::ZstdCompressor)
   @ CodecZstd ~/github/CodecZstd.jl/src/compression.jl:83
 [3] top-level scope
   @ REPL[7]:1

The benchmarks are unaffected as well.

nhz2 avatar Sep 15 '24 23:09 nhz2

I'm also thinking we should completely redo the JLD2 compression mechanjsm...

Absolutely, I already have some ideas. The current version was an experiment that worked slightly too well for its own good.

JonasIsensee avatar Sep 16 '24 06:09 JonasIsensee