lzma-rs
lzma-rs copied to clipboard
LZMAError("Expected unpacked size of 149198 but decompressed to 483334")'
Any ideas on what could cause this? Code:
let mut f = io::BufReader::new(fs::File::open(archive_path).unwrap());
let mut tar: Vec<u8> = Vec::new();
lzma_rs::xz_decompress(&mut f, &mut tar).unwrap();
An LZMA stream can include an unpacked_size hint in its header (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L61-L74), which the code then verifies to reject inconsistencies (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L312-L320).
Additionally, the LZMA2 format is a wrapper around LZMA, which can also provide an unpacked size hint on top of it (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma2.rs#L89-L95 and https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma2.rs#L161).
On top of that, XZ compresses each file with an LZMA2 stream.
So it looks like either your file was corrupted or there is a bug in my code due to a corner case that I didn't see before.
- Can you comment out the error check (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L312-L320) and let me know if decompression works for your file?
- Do you know which software created this archive?
- Can you run your code with an environment variable set to
RUST_LOG=lzma-rs=info
, so that I can get a clearer idea of what is going on? - If the file is publicly available (or if you can reproduce the issue on a publicly available file), can you point it to me so that I can debug further?
Would https://github.com/gendx/lzma-rs/pull/17 (or a variant of it) work for this use case?
@gendx I created a reproduction, will it help?
use std::io::BufReader;
use lzma_rs::decompress::{Options, UnpackedSize};
const DATA: &[u8] = &[
93, 0, 0, 1, 0, 0, 0, 111, 253, 255, 255, 163, 183, 255, 71, 62, 72, 21, 114, 57, 97, 81, 184,
146, 40, 230, 143, 221, 66, 251, 179, 253, 113, 133, 36, 209, 157, 136, 6, 166, 184, 144, 144,
180, 72, 27, 108, 146, 211, 153, 161, 58, 255, 52, 129, 75, 240, 91, 145, 234, 14, 20, 173, 77,
167, 21, 218, 124, 215, 37, 87, 175, 123, 84, 42, 90, 42, 15, 40, 156, 200, 228, 82, 146, 100,
78, 137, 120, 145, 121, 117, 60, 144, 172, 178, 50, 13, 116, 246, 17, 195, 181, 90, 136, 248,
128, 160, 103, 203, 131, 61, 101, 79, 13, 188, 166, 86, 177, 61, 29, 24, 147, 226, 211, 42, 16,
116, 153, 103, 9, 17, 112, 188, 159, 117, 114, 125, 209, 157, 150, 224, 44, 197, 39, 232, 193,
190, 15, 0, 4, 130, 28, 84, 73, 91, 189, 120, 8, 69, 78, 165, 182, 187, 252, 105, 241, 61, 199,
210, 26, 194, 15, 70, 225, 186, 144, 150, 195, 46, 150, 103, 144, 224, 196, 136, 25, 140, 45,
169, 29, 100, 201, 225, 234, 59, 16, 254, 147, 168, 89, 240, 42, 238, 251, 69, 135, 217, 29,
243, 218, 10, 172, 191, 192, 95, 186, 36, 117, 158, 138, 110, 8, 207, 141, 154, 9, 159, 181, 3,
71, 95, 111, 99, 247, 247, 33, 89, 114, 7, 61, 46, 250, 138, 21, 2, 105, 135, 90, 83, 215, 223,
60, 180, 69, 243, 112, 226, 228, 100, 144, 11, 167, 204, 83, 148, 112, 122, 31, 30, 71, 230,
64, 211, 22, 193, 147, 121, 76, 180, 3, 79, 198, 164, 40, 176, 206, 62, 34, 200, 114, 9, 81,
33, 129, 115, 94, 77, 166, 124, 38, 148, 20, 62, 133, 46, 21, 63, 37, 112, 202, 221, 26, 34, 4,
13, 189, 74, 75, 162, 189, 241, 123, 154, 163, 59, 7, 148, 203, 156, 18, 125, 126, 147, 209,
158, 105, 231, 27, 203, 191, 132, 50, 146, 226, 22, 201, 251, 40, 255, 101, 201, 255, 75, 201,
60, 5, 36, 246, 121, 87, 144, 239, 19, 138, 52, 229, 23, 193, 207, 4, 113, 151, 154, 147, 223,
52, 140, 114, 174, 146, 90, 0, 42, 38, 113, 62, 58, 164, 224, 122, 82, 205, 66, 43, 153, 64,
134, 64, 140, 123, 119, 237, 154, 159, 175, 94, 254, 119, 160, 234, 217, 50, 124, 84, 137, 204,
160, 36, 83, 32, 91, 171, 136, 100, 221, 214, 36, 161, 168, 31, 105, 199, 188, 91, 14, 248, 37,
175, 98, 22, 164, 68, 234, 76, 175, 144, 32, 39, 10, 60, 201, 181, 100, 52, 184, 202, 194, 77,
159, 147, 177, 98, 172, 139, 31, 185, 230, 46, 171, 105, 55, 106, 24, 254, 236, 255, 110, 189,
247, 139, 213, 200, 241, 113, 20, 28, 232, 144, 194, 54, 188, 180, 193, 196, 73, 234, 60, 111,
87, 228, 113, 186, 65, 174, 66, 219, 80, 167, 249, 36, 43, 57, 144, 101, 25, 188, 250, 28, 217,
2, 203, 195, 217, 6, 52, 125, 206, 106, 211, 148, 190, 119, 126, 34, 100, 117, 218, 183, 135,
108, 77, 244, 54, 116, 167, 24, 113, 104, 211, 29, 14, 143, 255, 124, 241, 74, 135, 140, 131,
196, 245, 234, 245, 213, 189, 35, 139, 127, 212, 247, 0,
];
const PACKED_SIZE: u64 = 566;
const UNPACKED_SIZE: u64 = 5048;
fn main() {
let mut input = BufReader::new(DATA);
let mut output = vec![];
let options = Options {
unpacked_size: UnpackedSize::UseProvided(Some(UNPACKED_SIZE)),
};
let result = lzma_rs::lzma_decompress_with_options(&mut input, &mut output, &options);
println!("The result is {:?}", result);
}
It prints: "Expected unpacked size of 5048 but decompressed to 5046". Packed size is 566 and 5 additional bytes are props.
Thanks @ibaryshnikov for your example.
However, I don't see how it's not behaving as expected. You provide an expected unpacked size of 5048 bytes, but the decompressed output is only 5046 bytes. When I set the expected size to 5046 your example stream decompresses fine.
So to me this works as intended - if the decompressed size doesn't match the expected one you provided, an error should be reported instead of returning any partial and/or potentially corrupted result. If you don't know the expected size, you can use UnpackedSize::ReadFromHeader
(the default decoding option) - as long as the stream header provides it - or UnpackedSize::UseProvided(None)
.
@gendx thanks for checking this example. It's a bit tricky to check when the input is ended. We can have one code, and iterate several times over it using different ranges. In my example, the code before the last is 1063818487, and we have two different valid ranges for it, first is 2663792640 and second is 1320009537. Then there's a switch to the last code, which is 0. Again, we can iterate over this code using different ranges. After removing the break on
pub fn is_finished_ok(&mut self) -> io::Result<bool> {
Ok(self.code == 0 && util::is_eof(self.stream)?)
}
I got three ranges for code 0: 2212886016, 1089365498 and 547851036 (before there was only 2212886016). That's how we can find the last two bytes, and have 5048 in total. I've compared the results with the library from another language and it seems correct.
I don't think it's related to the original issue where the difference between unpacked size is quite solid (149198 vs 483334), but It may be a separate issue. @gendx what do you think?
We are seeing the same issue, although with a very tiny difference:
LZMAError Expected unpacked size of 116412 but decompressed to 116411"
Unfortunately, it is again in a file that I cannot share. I have also so far been unable to reproduce the issue with other files.
However, the fix in #26 works for us as well.
As mentioned in https://github.com/gendx/lzma-rs/pull/26#issuecomment-594421944 the issue can also be reproduced by compressing the tests/files/range-coder-edge-case
file with options set to write the unpacked size to the header and then decompress it:
use lzma_rs;
use std::io::prelude::*;
fn main() {
let mut x = Vec::new();
std::fs::File::open("tests/files/range-coder-edge-case")
.unwrap()
.read_to_end(&mut x)
.unwrap();
let encode_options = lzma_rs::compress::Options {
unpacked_size: lzma_rs::compress::UnpackedSize::WriteToHeader(Some(x.len() as u64)),
};
let decode_options = lzma_rs::decompress::Options {
unpacked_size: lzma_rs::decompress::UnpackedSize::ReadFromHeader,
};
let mut compressed: Vec<u8> = Vec::new();
lzma_rs::lzma_compress_with_options(
&mut std::io::BufReader::new(x.as_slice()),
&mut compressed,
&encode_options,
)
.unwrap();
let mut bf = std::io::BufReader::new(compressed.as_slice());
let mut decomp: Vec<u8> = Vec::new();
lzma_rs::lzma_decompress_with_options(&mut bf, &mut decomp, &decode_options).unwrap();
}
I'm having the same issue with this file: http://beta.unity3d.com/download/d691e07d38ef/LinuxEditorInstaller/Unity.tar.xz
fn main() {
let mut file = std::io::BufReader::new(std::fs::File::open("Unity.tar.xz").unwrap());
let mut decomp: Vec<u8> = Vec::new();
lzma_rs::xz_decompress(&mut file, &mut decomp).unwrap();
}
This code produces the following error:
ZMAError("Expected unpacked size of 153357 but decompressed to 779954")