python-zstandard
python-zstandard copied to clipboard
Can't decompress data which is compressed by rust code
Here is simplest reproducing example:
import zstandard
zstandard.decompress(b'\x28\xb5\x2f\xfd\x00\x58\x11\x00\x00\x7b\x7d')
It raises an error: ZstdError: could not determine content size in frame header
More context
I'm trying to rewrite a client application in rust, it sends compressed data to server, then server decompresses it. Unfortunally the server failed to decompress data.
Here is how I do it in client side:
use std::io::Cursor;
use zstd;
fn main() {
let body = zstd::encode_all(Cursor::new("{}".as_bytes()), 3).unwrap();
for x in body.iter() {
print!("\\x{x:x?}");
}
}
And I copied the body and decompressed it in python, and it failed.
If I tried to compress data({} in my example) in python, and decompressed in rust, it successes. So I think it's the issue in python side.
In rust, I'm using zstd-rs for compressing/decompressing
@WindSoilder , If the size of input is known, you can pass that to encoder, that way the content size will be added to the frame header and you will be able to decompress in python too without any issue (no need to pass max_output_size in this case.
Sample rust code:
let input_bytes = serde_json::to_vec(&input)?;
let mut output = Vec::new();
let mut encoder = zstd::Encoder::new(&mut output, 0)?;
encoder.set_pledged_src_size(Some(input_bytes.len() as u64))?; // This adds the needed content size in frame header
encoder.write_all(&input_bytes)?;
encoder.finish()?;
We can use smth like this
import io
import zstandard
def decompress(value: bytes):
decompressed_data = b''
dctx = zstandard.ZstdDecompressor()
with io.BytesIO(value) as compressed_stream:
with dctx.stream_reader(compressed_stream) as reader:
while True:
chunk = reader.read()
if not chunk:
break
decompressed_data += chunk
return decompressed_data
As g2p says, you must declare an argument specifying the allowed max output size in order to decompress frames not advertising their length.
I'll keep this issue to track improving the error message to call this out.
But why can't this library handle it without the size? Obviously rust can, and every browser with support for zstd can. How do those decoders do it?