flate2-rs icon indicating copy to clipboard operation
flate2-rs copied to clipboard

ZlibEncoder doesn't report correct compressed size

Open moulins opened this issue 4 years ago • 7 comments

The following snippet:

fn main() {
    use std::io::Write;

    let buf = Vec::new();
    let mut file = flate2::write::ZlibEncoder::new(buf, flate2::Compression::best());
    file.write_all(b"Hello world!").unwrap();
    file.flush().unwrap();
    println!("reported size: {}", file.total_out());
    println!("actual size: {}", file.finish().unwrap().len());
}

Prints:

reported size: 20
actual size: 26

The 6 missing bytes correspond to the ZLIB header; ZlibEncoder doesn't take into account its size and only returns the size of the wrapped deflate stream.

If this is the intended behavior, I think it should be clarified in the total_out() documentation, because this is very unintuitive and unexpected.

moulins avatar Jul 29 '20 22:07 moulins

Thanks for the report! Is this an issue with all the backends? Or just one? If so it may be a bug for that specific backend.

alexcrichton avatar Jul 30 '20 14:07 alexcrichton

The bug is reproducible on all 4 backends.
For completeness, I also tested the read and bufread encoders, and their total_in method correctly returns the size including the header.

moulins avatar Jul 30 '20 23:07 moulins

Hm ok if this reproduces everywhere it may be best to just update the documentation to indicate it doesn't include the 6-byte header.

alexcrichton avatar Jul 31 '20 15:07 alexcrichton

This means there's an asymmetry between write::ZlibEncoder and read::ZlibEncoder though: the write encoder doesn't include the header, but the read encoder does!

Personally, I would prefer this to be fixed, but I don't really know what this entails for the implementation, and as long as the current behavior is correctly documented I can live with it.

moulins avatar Jul 31 '20 22:07 moulins

That's true yeah, if all the backends behave consistently we can work around that in each implementation. Seems reasonable to fix then!

alexcrichton avatar Aug 03 '20 15:08 alexcrichton

Note: I just found out that my explanation was slightly wrong, as the ZLIB header is only 2 bytes, not 6.

The 6 extra bytes actually correspond to:

  • The 2 bytes ZLIB header at the start of the stream
  • The 4 bytes CRC checksum at the end of the stream

moulins avatar Aug 06 '20 17:08 moulins

There zlib wrapper can have further 4 bytes of checksum following the header at the start of the stream if FDICT is used (which is only available with the zlib back-end currently.).

oyvindln avatar Oct 04 '20 01:10 oyvindln