zstd-ruby icon indicating copy to clipboard operation
zstd-ruby copied to clipboard

Issue with simple decompression of a stream compressed string

Open mperice opened this issue 5 months ago • 2 comments

I have the following code, where I try to 'simple' decompress json lines compressed with streaming compression. When comparing to decompressions from a 'simple' compressed string I mostly get the same result, however sometimes the decompressed result doesn't match. You can replicate the issue with the following code:

def decompress_compressed_lines(line_length)
  lines = (0...10_000).map{"a" * line_length}

  stream = Zstd::StreamingCompress.new

  compressed_with_stream = lines.map do |line|
    stream.compress(line)
    stream.finish
  end

  compressed_with_simple = lines.map do |line|
    Zstd.compress(line)
  end

  aaa = compressed_with_stream.map{|a| Zstd.decompress(a)}
  bbb = compressed_with_simple.map{|b| Zstd.decompress(b)}

  aaa.zip(bbb).count{|a, b| a != b}
end
  
>> 20.times.map{PlExportCache.decompress_compressed_lines(1023)}
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

>> 20.times.map{PlExportCache.decompress_compressed_lines(1024)}
=> [0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0]

>> 20.times.map{PlExportCache.decompress_compressed_lines(2047)}
=> [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0]

>> 20.times.map{PlExportCache.decompress_compressed_lines(2048)}
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

If I replace 'simple' decompression with stream decompression I consistently get the correct results. Is there any obvious reason why this would happen? Using zstd-ruby (1.5.6.6).

Edit: It seems the issue occurs only when lines are between 1024 and 2048 of length.

mperice avatar Sep 05 '24 00:09 mperice