async-compression icon indicating copy to clipboard operation
async-compression copied to clipboard

Gzip file created with async compression not decodable

Open thmang82 opened this issue 3 years ago • 3 comments

Hello,

I try to create a stream compressed gzip file on the fly while receiving chunks of data. My issue is that the file cannot be decoded from gzip/gunzip :-(

I stripped it down to a test encoding that just encodes "test" to see what the difference is compared to a Node.js based test encoding that works. But I do not get why there is a difference, and if this is really a bug inside the library, or an issue of how i use the async file io.

Here is my sample code:

use tokio::fs::File;
use async_compression::tokio_02::write::{ GzipEncoder };

let mut file = File::create(„test.txt.gz“).await?;
let writer = GzipEncoder::new(file);
writer.write("test".as_bytes()).await?;

Result:

xxd test.txt.gz
00000000: 1f8b 0800 0000 0000 00ff 2a49 2d2e 0100  ..........*I-...
00000010: 0000 ffff                                ....

This file is not decodable! Gunzip says it is corrupt!

When I create the file via this small node.js script

var gz = zlib.createGzip(); // createGzip
gz.pipe(fs.createWriteStream(„test_node.txt.gz“));
gz.write(„test“);
gz.end()

the resulting file has the following content:

00000000: 1f8b 0800 0000 0000 0013 2b49 2d2e 0100  ..........+I-...
00000010: 0c7e 7fd8 0400 0000                      .~......

The issue is not the missing checksum and file length: I added checksum and length via gzip-header create. The file is still not decodable via gunzip

The interesting bit seems to be the encoded stream, they differ between async-compression and the working Node.js: RUST: 2a49 2d2e 0100 0000 ffff Node: 2b49 2d2e 0100

Why are there 4 tailing bytes 0000 ffff ? And why is the first byte different?

thmang82 avatar Jan 12 '22 16:01 thmang82

I see two issues with your code:

  1. Calling write without checking the returned amount, each call to write may not actually consume the entire input, so you should be using write_all. (There is a clippy lint that will warn about this for you).
  2. Not calling shutdown to finish the output stream, this is where the gzip trailer is written.

Fixing those issues I see that it gives correct output:

use tokio::io::AsyncWriteExt;
use async_compression::tokio::write::GzipEncoder;

#[tokio::main]
async fn main() -> std::io::Result<()> {
    let mut writer = GzipEncoder::new(Vec::new());
    writer.write_all("test".as_bytes()).await?;
    writer.shutdown().await?;
    tokio::io::stdout().write_all(&writer.into_inner()).await?;
    Ok(())
}
> cargo run | gunzip
   Compiling foo v0.1.0 (/tmp/tmp.xrCjHsQa0S/foo)
    Finished dev [unoptimized + debuginfo] target(s) in 0.77s
     Running `/home/nemo157/.cargo/shared-target/debug/foo`
test

(the encoded stream is still different, but maybe gzip has multiple valid encodings of the same data 🤷)

Nemo157 avatar Jan 25 '22 17:01 Nemo157

I ran into the same issue, the above piece of code would be great as an example on docs.rs!

feikesteenbergen avatar Feb 01 '22 15:02 feikesteenbergen

writer.shutdown().await?;

But there is the same issue with the following code:

let mut file = tokio::fs::File::create("file.lzma").await.unwrap(); let mut compr = async_compression::tokio::write::LzmaEncoder::new(file); let mut out = "test data to compress".as_bytes(); tokio::io::copy(&mut out, &mut compr).await.unwrap();

It gives no panics, but:

$ lzma -t file.lzma lzma: file.lzma: Unexpected end of input

Without compressor tokio::io::copy works fine, file is complete. Of course, it fails also with any other compressor.

P.S.: And I just discovered that tokio::io::copy is not consuming compr, and compr.shutdown() after all of that activity really helps.

dmk978 avatar Aug 05 '22 11:08 dmk978