async-compression
async-compression copied to clipboard
Gzip file created with async compression not decodable
Hello,
I try to create a stream compressed gzip file on the fly while receiving chunks of data. My issue is that the file cannot be decoded from gzip/gunzip :-(
I stripped it down to a test encoding that just encodes "test" to see what the difference is compared to a Node.js based test encoding that works. But I do not get why there is a difference, and if this is really a bug inside the library, or an issue of how i use the async file io.
Here is my sample code:
use tokio::fs::File;
use async_compression::tokio_02::write::{ GzipEncoder };
let mut file = File::create(„test.txt.gz“).await?;
let writer = GzipEncoder::new(file);
writer.write("test".as_bytes()).await?;
Result:
xxd test.txt.gz
00000000: 1f8b 0800 0000 0000 00ff 2a49 2d2e 0100 ..........*I-...
00000010: 0000 ffff ....
This file is not decodable! Gunzip says it is corrupt!
When I create the file via this small node.js script
var gz = zlib.createGzip(); // createGzip
gz.pipe(fs.createWriteStream(„test_node.txt.gz“));
gz.write(„test“);
gz.end()
the resulting file has the following content:
00000000: 1f8b 0800 0000 0000 0013 2b49 2d2e 0100 ..........+I-...
00000010: 0c7e 7fd8 0400 0000 .~......
The issue is not the missing checksum and file length: I added checksum and length via gzip-header create. The file is still not decodable via gunzip
The interesting bit seems to be the encoded stream, they differ between async-compression and the working Node.js: RUST: 2a49 2d2e 0100 0000 ffff Node: 2b49 2d2e 0100
Why are there 4 tailing bytes 0000 ffff ? And why is the first byte different?
I see two issues with your code:
- Calling
writewithout checking the returned amount, each call towritemay not actually consume the entire input, so you should be usingwrite_all. (There is a clippy lint that will warn about this for you). - Not calling
shutdownto finish the output stream, this is where the gzip trailer is written.
Fixing those issues I see that it gives correct output:
use tokio::io::AsyncWriteExt;
use async_compression::tokio::write::GzipEncoder;
#[tokio::main]
async fn main() -> std::io::Result<()> {
let mut writer = GzipEncoder::new(Vec::new());
writer.write_all("test".as_bytes()).await?;
writer.shutdown().await?;
tokio::io::stdout().write_all(&writer.into_inner()).await?;
Ok(())
}
> cargo run | gunzip
Compiling foo v0.1.0 (/tmp/tmp.xrCjHsQa0S/foo)
Finished dev [unoptimized + debuginfo] target(s) in 0.77s
Running `/home/nemo157/.cargo/shared-target/debug/foo`
test
(the encoded stream is still different, but maybe gzip has multiple valid encodings of the same data 🤷)
I ran into the same issue, the above piece of code would be great as an example on docs.rs!
writer.shutdown().await?;
But there is the same issue with the following code:
let mut file = tokio::fs::File::create("file.lzma").await.unwrap(); let mut compr = async_compression::tokio::write::LzmaEncoder::new(file); let mut out = "test data to compress".as_bytes(); tokio::io::copy(&mut out, &mut compr).await.unwrap();
It gives no panics, but:
$ lzma -t file.lzma lzma: file.lzma: Unexpected end of input
Without compressor tokio::io::copy works fine, file is complete. Of course, it fails also with any other compressor.
P.S.: And I just discovered that tokio::io::copy is not consuming compr, and compr.shutdown() after all of that activity really helps.