zstd-ruby
zstd-ruby copied to clipboard
Issue with simple decompression of a stream compressed string
I have the following code, where I try to 'simple' decompress json lines compressed with streaming compression. When comparing to decompressions from a 'simple' compressed string I mostly get the same result, however sometimes the decompressed result doesn't match. You can replicate the issue with the following code:
def decompress_compressed_lines(line_length)
lines = (0...10_000).map{"a" * line_length}
stream = Zstd::StreamingCompress.new
compressed_with_stream = lines.map do |line|
stream.compress(line)
stream.finish
end
compressed_with_simple = lines.map do |line|
Zstd.compress(line)
end
aaa = compressed_with_stream.map{|a| Zstd.decompress(a)}
bbb = compressed_with_simple.map{|b| Zstd.decompress(b)}
aaa.zip(bbb).count{|a, b| a != b}
end
>> 20.times.map{PlExportCache.decompress_compressed_lines(1023)}
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>> 20.times.map{PlExportCache.decompress_compressed_lines(1024)}
=> [0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0]
>> 20.times.map{PlExportCache.decompress_compressed_lines(2047)}
=> [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0]
>> 20.times.map{PlExportCache.decompress_compressed_lines(2048)}
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
If I replace 'simple' decompression with stream decompression I consistently get the correct results. Is there any obvious reason why this would happen? Using zstd-ruby (1.5.6.6).
Edit: It seems the issue occurs only when lines are between 1024 and 2048 of length.