lzma-rs icon indicating copy to clipboard operation
lzma-rs copied to clipboard

Support legacy LZMA format with `unpacked_size` 32bit long

Open XVilka opened this issue 4 years ago • 10 comments

Old versions of lzma SDKs (e.g. from 7zip) were using 32bit long field for the unpacked size in the header. Would be awesome to have support for these too. The LZMA SDK 4.05 (from 2004 year) for example handles it this way.

XVilka avatar Nov 13 '19 10:11 XVilka

Do you have a reference to the code or documentation, and example files to test this? Otherwise it will be hard to add support for this use case. Feel free to send a pull request as well!

gendx avatar Nov 14 '19 12:11 gendx

Yes, see how it is done https://github.com/XVilka/ocaml-lzma_7z/blob/master/lzma.ml#L387

One of the cases it was used - very old 7Zip SDK, which was used in the modified CramFS version that used LZMA:

  • https://github.com/batterystaples/mkcramfs-lzma
  • https://github.com/digiampietro/lzma-uncramfs

Note how it is used 32bit ints for outsize:

  • https://github.com/digiampietro/lzma-uncramfs/blob/master/lzma-rg/SRC/7zip/Compress/LZMA_C/LzmaDecode.c
  • https://github.com/digiampietro/lzma-uncramfs/blob/master/lzma-rg/SRC/7zip/Compress/LZMA_C/decode.c#L87

XVilka avatar Nov 15 '19 03:11 XVilka

Would https://github.com/gendx/lzma-rs/pull/17 (or a variant of it) work for this use case?

gendx avatar Dec 16 '19 22:12 gendx

Yes, ability to form the header manually is good enough, thanks!

XVilka avatar Dec 17 '19 05:12 XVilka

The implementation in https://github.com/gendx/lzma-rs/pull/17 only supports a 64-bit or 0-bit field for the unpacked size though. So I assume it wouldn't work for a 32-bit unpacked size out-of-the-box.

gendx avatar Dec 17 '19 21:12 gendx

@gendx someone also needs this feature it seems, not only me: https://users.rust-lang.org/t/extract-lzma-file/24793/5

XVilka avatar Jun 18 '20 09:06 XVilka

@gendx someone is also needs this feature it seems, not only me: https://users.rust-lang.org/t/extract-lzma-file/24793/5

Thanks for the pointer. I don't have a lot of time for implementing it at the moment, nor files from the old SDK to test with.

But feel free to send a pull request, I'll be happy to take a look!

gendx avatar Jun 18 '20 10:06 gendx

If the unpacked_size is known beforehand, #74 should cover most of this use case, except for reading the 13-byte header. You would have to read the header manually then construct the decoder with the params.

chyyran avatar Aug 05 '22 18:08 chyyran

Will this do ? `impl LzmaParams { // Other methods omitted for brevity

/// Read LZMA parameters from the LZMA stream header.
pub fn read_header<R>(input: &mut R, options: &Options) -> error::Result<LzmaParams>
where
    R: io::BufRead,
{
    // Properties
    let props = input.read_u8().map_err(error::Error::HeaderTooShort)?;

    let mut pb = props as u32;
    if pb == 0xFF {
        return Err(error::Error::InvalidLzmaProperties);
    }
    pb = pb % 9;

    let mut lp = (props / 9) as u32;
    if lp == 0xFF {
        return Err(error::Error::InvalidLzmaProperties);
    }
    lp = lp % 5;

    let mut lc = (props / (9 * 5)) as u32;
    if lc == 0xFF {
        return Err(error::Error::InvalidLzmaProperties);
    }
    lc = lc % 9;

    let properties = LzmaProperties { lc, lp, pb };

    // Dictionary size
    let mut dict_size = [0u8; 4];
    input
        .read_exact(&mut dict_size)
        .map_err(error::Error::HeaderTooShort)?;
    let dict_size = u32::from_le_bytes(dict_size) as usize;

    // Unpacked size
    let mut unpacked_size = [0u8; 8];
    input
        .read_exact(&mut unpacked_size[0..4])
        .map_err(error::Error::HeaderTooShort)?;
    let unpacked_size = u32::from_le_bytes(unpacked_size[0..4]) as usize;

    Ok(Self {
        properties,
        dict_size,
        unpacked_size,
    })
}

} `

gauravsaini avatar Dec 03 '22 07:12 gauravsaini

Tests will look as follows ` #[test] fn test_lzma_params_read_header() { // Test reading LZMA params from a valid stream header let mut input = Cursor::new(b"\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"); let options = Options::default(); let params = LzmaParams::read_header(&mut input, &options); assert!(params.is_ok()); let params = params.unwrap(); assert_eq!(params.properties.lc, 0); assert_eq!(params.properties.lp, 0); assert_eq!(params.properties.pb, 1); assert_eq!(params.dict_size, 1); assert_eq!(params.unpacked_size, None);

// Test reading LZMA params from a stream header with unpacked size
let mut input = Cursor::new(b"\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x64");
let options = Options::default();
let params = LzmaParams::read_header(&mut input, &options);
assert!(params.is_ok());
let params = params.unwrap();
assert_eq!(params.properties.lc, 0);
assert_eq!(params.properties.lp, 0);
assert_eq!(params.properties.pb, 1);
assert_eq!(params.dict_size, 1);
assert_eq!(params.unpacked_size, Some(100));

// Test reading LZMA params from a stream header with invalid properties
let mut input = Cursor::new(b"\xFF\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00");
let options = Options::default();
let params = LzmaParams::read_header(&mut input, &options);
assert!(params.is_err());

// Test reading LZMA params from a stream header with too few bytes
let mut input = Cursor::new(b"\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00");
let options = Options::default();
let params = LzmaParams::read_header(&mut input, &options);
assert!(params.is_err());

}

`

gauravsaini avatar Dec 03 '22 07:12 gauravsaini