python-zstandard icon indicating copy to clipboard operation
python-zstandard copied to clipboard

Can't decompress data which is compressed by rust code

Open WindSoilder opened this issue 1 year ago • 3 comments

Here is simplest reproducing example:

import zstandard
zstandard.decompress(b'\x28\xb5\x2f\xfd\x00\x58\x11\x00\x00\x7b\x7d')

It raises an error: ZstdError: could not determine content size in frame header

More context

I'm trying to rewrite a client application in rust, it sends compressed data to server, then server decompresses it. Unfortunally the server failed to decompress data.

Here is how I do it in client side:

use std::io::Cursor;
use zstd;

fn main() {
    let body = zstd::encode_all(Cursor::new("{}".as_bytes()), 3).unwrap();
    for x in body.iter() {
        print!("\\x{x:x?}");
    }
}

And I copied the body and decompressed it in python, and it failed.


If I tried to compress data({} in my example) in python, and decompressed in rust, it successes. So I think it's the issue in python side.

In rust, I'm using zstd-rs for compressing/decompressing

WindSoilder avatar Aug 12 '24 08:08 WindSoilder

See #150; you must pass a max_output_size to decompress in this case.

Docs

g2p avatar Sep 23 '24 10:09 g2p

@WindSoilder , If the size of input is known, you can pass that to encoder, that way the content size will be added to the frame header and you will be able to decompress in python too without any issue (no need to pass max_output_size in this case. Sample rust code:

    let input_bytes = serde_json::to_vec(&input)?;
    let mut output = Vec::new();
    let mut encoder = zstd::Encoder::new(&mut output, 0)?;
    encoder.set_pledged_src_size(Some(input_bytes.len() as u64))?;   // This adds the needed content size in frame header
    encoder.write_all(&input_bytes)?;
    encoder.finish()?;

aarshivv avatar Feb 05 '25 18:02 aarshivv

We can use smth like this

import io
import zstandard

def decompress(value: bytes):
    decompressed_data = b''

    dctx = zstandard.ZstdDecompressor()
    with io.BytesIO(value) as compressed_stream:
        with dctx.stream_reader(compressed_stream) as reader:
            while True:
                chunk = reader.read()
                if not chunk:
                    break
                decompressed_data += chunk

    return decompressed_data

rstm-sf avatar Feb 17 '25 07:02 rstm-sf

As g2p says, you must declare an argument specifying the allowed max output size in order to decompress frames not advertising their length.

I'll keep this issue to track improving the error message to call this out.

indygreg avatar Aug 17 '25 16:08 indygreg

But why can't this library handle it without the size? Obviously rust can, and every browser with support for zstd can. How do those decoders do it?

az-faro avatar Nov 10 '25 12:11 az-faro