flate2-rs
flate2-rs copied to clipboard
flate2 ZLibDecoder reads too much data from files.
The flate2 ZLibDecoder seems to read too much data and advance file pointers too far.
Parsing a raw file filled with individual hunks of zlib compressed data:
- 80-22216 Zlib Chunk 1
dd if=sample/ctags_source-5.8-5-source.hpkg bs=1 skip=80 count=22137 of=test1file test1 ; test1: zlib compressed data
- 22217-44314 Zlib Chunk 2
dd if=sample/ctags_source-5.8-5-source.hpkg bs=1 skip=22217 count=22097 of=test2file test2 ; test2: zlib compressed data
I can validate these chunks:
cat test1 | zlib-flate -uncompress > test1.uncompressedcat test2 | zlib-flate -uncompress > test2.uncompressed
[kallisti5@eris hpkg-rs]$ ls -lah test1.uncompressed
-rw-r--r-- 1 kallisti5 users 64K Feb 28 09:19 test1.uncompressed
[kallisti5@eris hpkg-rs]$ ls -lah test2.uncompressed
-rw-r--r-- 1 kallisti5 users 64K Feb 28 09:20 test2.uncompressed
However.. when I try to decode these chunks with flate2 / ZlibDecoder..
/// Inflate the heap section of a hpkg for later processing
/// XXX: This will likely need reworked... just trying to figure out what's going on
fn inflate_heap(&mut self) -> Result<usize, Box<dyn error::Error>> {
let header = self.header.as_ref().unwrap();
let filename = self.filename.as_ref().unwrap();
println!("header: Heap chunk size: {}", header.heap_chunk_size);
println!("header: Heap compressed size: {}", header.heap_size_compressed);
println!("header: Heap uncompressed size: {}", header.heap_size_uncompressed);
println!("header: Heap compression: {}", header.heap_compression);
let mut f = File::open(filename)?;
let mut pos = header.header_size as u64;
f.seek(SeekFrom::Start(pos))?;
while pos < header.heap_size_compressed {
println!("Seek from file {}, heap {}", pos, pos - header.header_size as u64);
let mut reader: Box<dyn Read> = match header.heap_compression {
0 => Box::new(&f),
1 => Box::new(ZlibDecoder::new(&f)),
2 => Box::new(zstd::stream::read::Decoder::new(&f)?),
_ => return Err(From::from(format!("Unknown hpkg heap compression: {}", header.heap_compression)))
};
let mut buffer = vec![0; header.heap_chunk_size as usize];
reader.read_exact(&mut buffer)?;
self.heap_data.push(buffer);
pos = (&f).stream_position()?;
}
Ok(0)
}
---- package::tests::test_package_load_valid stdout ----
header: Heap chunk size: 65536 (uncompressed yo)
header: Heap compressed size: 501432
header: Heap uncompressed size: 1988947
header: Heap compression: 1
Seek from file 80, heap 0
Seek from file 32848, heap 32768
ERROR: corrupt deflate stream
thread 'package::tests::test_package_load_valid' panicked at 'assertion failed: false', src/package.rs:297:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
read_exact produces 64k uncompressed as expected, but the file pointer is moved 32k in the file vs the expected 22057 bytes. (22057 marks the "end" of the compressed data stream and the start of the next 0x78, 0xDA)
I feel like the solution is once ZLibDecoder reads the final 32k chunk, it should "back the source reader to the end of the zlib stream"?
This would enable users to know where the last compressed stream ended within a file.
https://github.com/rust-lang/flate2-rs/blob/main/src/deflate/read.rs#L161 feels like the source of the guaranteed 32k read from the files.
I think this is the same problem as #367 except that that issue is for gzip. Essentially, this is expected behavior for the read interfaces that actually wrap the Read type in a new std::io::BufReader for each decoder.
To fix, wrap the File in a BufReader once and you can then pass it to multiple bufread::ZlibDecoder instances.
This will, however, also make stream_position incorrect, even if you can access it, so you will need a different way to know when to terminate the loop.