Can't decompress .tar.lz (lzip?)
Version
0.3.1
Description
I'm trying to decompress binutils-2.37.tar.lz (link here), but ouch is failing.
Current Behavior
Running the following command:
ouch decompress binutils-2.37.tar.lz
errors out with:
[ERROR] lzma data error
Expected Behavior
The tarball should be decompressed just like any other tarball.
Additional Information
No response
I started to look into this one to squash the bug, but it looks like ouch doesn't support the .lz file extension.
It was mentioned in a previous PR here.
@Crypto-Spartan thanks.
I wonder if we can create a sys-crate for lzip/lzlib that can decompress .lz files similar to how bzip2-sys and other decompression crates.
@Crypto-Spartan I started working on an lzip-rs crate.
@firasuke that's really awesome that you are adding this to the Rust ecosystem, please, keep us updated on the state of the project!
To keep you updated:
- The low level crate lzip-sys is mostly finished, the bindings were generated using bindgen then manually cleaned and modified a bit
- The higher level crate lzip-rs is still being worked on while imitating what
bzip2-rsandxz2-rsare doing
Let me hear your thoughts on this!
Awesome! This sounds perfect to me, I really like the interface of bzip2 and xz2 crates.
Aside from the code that detects extensions from file paths, we just need two lines to add a encoder and decoder to ouch:
https://github.com/ouch-org/ouch/blob/35481bdb4d8120991cbf95dfea68f8b4373d9bd5/src/commands.rs#L323-L330
https://github.com/ouch-org/ouch/blob/35481bdb4d8120991cbf95dfea68f8b4373d9bd5/src/commands.rs#L477-L483
// so
Lzip => Box::new(lzip::write::LzEncoder::new(encoder, Default::default())),
// and
Lzip => Box::new(lzip::read::LzDecoder::new(decoder)),
Done :sunglasses:.
Thanks! Though it would be:
// so
Lzip => Box::new(lzip::write::LzEncoder::new(encoder, Default::default())),
// and
Lzip => Box::new(lzip::read::LzDecoder::new(decoder)),
You're right, I fixed the typo.
Alright almost done with the basic stuff.
Inside lzip-rs/src I tried to copy what xz2-rs and bzip2-rs are doing, but I was unsuccessful in creating a valid stream.rs file as the bindings generated from lzlib.h don't contain an lzip_stream or a similarly named struct.
Any thoughts on this?
@firasuke I might be completely wrong here, but I have read some code and I think Stream is used in both bzip2 and xz2 as an inner field to create structs for the Encoder and Decoder in the Rust side.
In those two libraries, there is no type distinction between encoder and decoder, every function just receives a Stream.
However, the bindings you are using have already exposed two differently typed structs, the LZ_Decoder and LZ_Encoder, so I bet you can use the two directly.
What do you think?
@marcospb19 Thanks for the suggestion.
Unfortunately, it seems that I have absolutely no idea on how to implement this one last file (which could be the single most important file), which is why I tried to contact Alex Crichton since I used most of the code from his xz2rs and bzip2-rs projects, to try to get his thoughts on this crate.
I opened an issue (relating to lzip decompression support) in xz2-rs repo asking him kindly to check the new crate and give us his thoughts on this, perhaps if it receives more attention he might notice it as I've come to know that he's very busy with other Rust projects.
To make this work you'd definitively need to remove all the Lzma from your crate and start it again from scratch copying small pieces and trying to make them work, in 'baby steps' fashion, it's absolutely impossible to copy a project of this size and make it compile, I think Alex would state the same to you.
You would also need to erase all the documentation, see, the way that xz is designed is different and the docs are not copy-paste reusable in a different crate.
The stream file is not strictly necessary, and a Stream struct is not possible, the API of lzip is different, you already have two structs so you need to use those two to implement the small encoder and decoder pieces of functionality, but one at a time, for sure.
The Stream in the other crates served both as an Encoder and Decoder, so all implementations went to the same struct.
@marcospb19 thanks for your thoughts on this. I understand that it's not possible to physically copy and paste a project of this size and expect it to work, as I've said I've only copied parts of what's inside lzma-rs/src, the rest was all written from scratch.
To make this work you'd definitively need to remove all the Lzma from your crate and start it again from scratch copying small pieces and trying to make them work, in 'baby steps' fashion, it's absolutely impossible to copy a project of this size and make it compile, I think Alex would state the same to you.
Part of the reason of why I reused parts of bzip2-rs and xz2-rs is because they're different compression crates that work and share around 80% of used code. Another reason is that lzip itself (the program) and lzlib (the library) were designed to closely mimic bzip2 (as stated by their respective upstream developers).
The stream file is not strictly necessary, and a Stream struct is not possible, the API of lzip is different, you already have two structs so you need to use those two to implement the small encoder and decoder pieces of functionality, but one at a time, for sure.
I see, so if I understood this correctly, my work should be easier here as I already have the decoder and encoder structs implemented, which is why I won't need a separate stream struct like bzip2-rs and xz2-rs. One question is, apparently the fields defined in the encoder and decoder structs are the _unused: [u8; 0] which is supposed to represent the private fields in the original C file, but apparently bindgen determined that it's better this way?
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_Encoder {
_unused: [u8; 0],
}
...
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_Decoder {
_unused: [u8; 0],
}
Any ideas?
Sorry I didn't caught this before. Seems like this is an error when generating the bindings with bindgen.
https://users.rust-lang.org/t/generating-rust-ffi-bindings-to-c-libraries/18500/4
I think you can try to generate the bindings from a .c file that only contains your struct definitions, then you can copy to your lib.rs file.
Or you can translate it manually
Taking this example from bzip2:
typedef
struct {
char *next_in;
unsigned int avail_in;
unsigned int total_in_lo32;
unsigned int total_in_hi32;
char *next_out;
unsigned int avail_out;
unsigned int total_out_lo32;
unsigned int total_out_hi32;
void *state;
void *(*bzalloc)(void *,int,int);
void (*bzfree)(void *,void *);
void *opaque;
}
bz_stream;
it was translated to
#[repr(C)]
pub struct bz_stream {
pub next_in: *mut c_char,
pub avail_in: c_uint,
pub total_in_lo32: c_uint,
pub total_in_hi32: c_uint,
pub next_out: *mut c_char,
pub avail_out: c_uint,
pub total_out_lo32: c_uint,
pub total_out_hi32: c_uint,
pub state: *mut c_void,
pub bzalloc: Option<extern "C" fn(*mut c_void, c_int, c_int) -> *mut c_void>,
pub bzfree: Option<extern "C" fn(*mut c_void, *mut c_void)>,
pub opaque: *mut c_void,
}
Similarly, in lzip if you have:
struct LZ_Encoder
{
unsigned long long partial_in_size;
unsigned long long partial_out_size;
struct LZ_encoder_base * lz_encoder_base; /* these 3 pointers make a */
struct LZ_encoder * lz_encoder; /* polymorphic encoder */
struct FLZ_encoder * flz_encoder;
enum LZ_Errno lz_errno;
bool fatal;
};
I think it can be translated to:
#[repr(C)]
pub struct LzEncoder {
partial_in_size: c_ulonglong,
partial_out_size: c_ulonglong,
lz_encoder_base: *LzEncoderBase ,
lz_encoder: *LzEncoder ,
flz_encoder: *LzEncoder ,
lz_errno: c_int,
fatal: c_int,
}
lz_errno is an enum but enums in C are libc::c_int.
You'll probably have to take care of LzEncoderBase too.
Makes sense?
@marcospb19 thanks for the suggestion, but I already did that, according to lzlib's manual everything should be contained in lzlib.h and after getting the _unused: [u8; 0] above I did some googling, and decided to run bindgen on the lzlib.c file, and the output was literal giberrish.
Did some cleaning on it, and removed the parts that were identical from running it on lzlib.h and it looks like this:
pub type State = ::std::os::raw::c_int;
pub const states: ::std::os::raw::c_uint = 12;
pub type _bindgen_ty_1 = ::std::os::raw::c_uint;
pub const min_dictionary_bits: ::std::os::raw::c_uint = 12;
pub const min_dictionary_size: ::std::os::raw::c_uint = 4096;
pub const max_dictionary_bits: ::std::os::raw::c_uint = 29;
pub const max_dictionary_size: ::std::os::raw::c_uint = 536870912;
pub const literal_context_bits: ::std::os::raw::c_uint = 3;
pub const literal_pos_state_bits: ::std::os::raw::c_uint = 0;
pub const pos_state_bits: ::std::os::raw::c_uint = 2;
pub const pos_states: ::std::os::raw::c_uint = 4;
pub const pos_state_mask: ::std::os::raw::c_uint = 3;
pub const len_states: ::std::os::raw::c_uint = 4;
pub const dis_slot_bits: ::std::os::raw::c_uint = 6;
pub const start_dis_model: ::std::os::raw::c_uint = 4;
pub const end_dis_model: ::std::os::raw::c_uint = 14;
pub const modeled_distances: ::std::os::raw::c_uint = 128;
pub const dis_align_bits: ::std::os::raw::c_uint = 4;
pub const dis_align_size: ::std::os::raw::c_uint = 16;
pub const len_low_bits: ::std::os::raw::c_uint = 3;
pub const len_mid_bits: ::std::os::raw::c_uint = 3;
pub const len_high_bits: ::std::os::raw::c_uint = 8;
pub const len_low_symbols: ::std::os::raw::c_uint = 8;
pub const len_mid_symbols: ::std::os::raw::c_uint = 8;
pub const len_high_symbols: ::std::os::raw::c_uint = 256;
pub const max_len_symbols: ::std::os::raw::c_uint = 272;
pub const min_match_len: ::std::os::raw::c_uint = 2;
pub const max_match_len: ::std::os::raw::c_uint = 273;
pub const min_match_len_limit: ::std::os::raw::c_uint = 5;
pub type _bindgen_ty_2 = ::std::os::raw::c_uint;
pub const bit_model_move_bits: ::std::os::raw::c_uint = 5;
pub const bit_model_total_bits: ::std::os::raw::c_uint = 11;
pub const bit_model_total: ::std::os::raw::c_uint = 2048;
pub type _bindgen_ty_3 = ::std::os::raw::c_uint;
pub type Bit_model = ::std::os::raw::c_int;
extern "C" {
pub static crc32: [u32; 256usize];
}
extern "C" {
pub static lzip_magic: [u8; 4usize];
}
pub type Lzip_header = [u8; 6usize];
pub const Lh_size: ::std::os::raw::c_uint = 6;
pub type _bindgen_ty_4 = ::std::os::raw::c_uint;
pub type Lzip_trailer = [u8; 20usize];
pub const Lt_size: ::std::os::raw::c_uint = 20;
pub type _bindgen_ty_5 = ::std::os::raw::c_uint;
pub const lzd_min_free_bytes: ::std::os::raw::c_uint = 273;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_decoder {
pub cb: Circular_buffer,
pub partial_data_pos: ::std::os::raw::c_ulonglong,
pub rdec: *mut Range_decoder,
pub dictionary_size: ::std::os::raw::c_uint,
pub crc: u32,
pub member_finished: bool,
pub verify_trailer_pending: bool,
pub pos_wrapped: bool,
pub rep0: ::std::os::raw::c_uint,
pub rep1: ::std::os::raw::c_uint,
pub rep2: ::std::os::raw::c_uint,
pub rep3: ::std::os::raw::c_uint,
pub state: State,
pub bm_literal: [[Bit_model; 768usize]; 8usize],
pub bm_match: [[Bit_model; 4usize]; 12usize],
pub bm_rep: [Bit_model; 12usize],
pub bm_rep0: [Bit_model; 12usize],
pub bm_rep1: [Bit_model; 12usize],
pub bm_rep2: [Bit_model; 12usize],
pub bm_len: [[Bit_model; 4usize]; 12usize],
pub bm_dis_slot: [[Bit_model; 64usize]; 4usize],
pub bm_dis: [Bit_model; 115usize],
pub bm_align: [Bit_model; 16usize],
pub match_len_model: Len_model,
pub rep_len_model: Len_model,
}
#[test]
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_encoder_base {
pub mb: Matchfinder_base,
pub member_size_limit: ::std::os::raw::c_ulonglong,
pub crc: u32,
pub bm_literal: [[Bit_model; 768usize]; 8usize],
pub bm_match: [[Bit_model; 4usize]; 12usize],
pub bm_rep: [Bit_model; 12usize],
pub bm_rep0: [Bit_model; 12usize],
pub bm_rep1: [Bit_model; 12usize],
pub bm_rep2: [Bit_model; 12usize],
pub bm_len: [[Bit_model; 4usize]; 12usize],
pub bm_dis_slot: [[Bit_model; 64usize]; 4usize],
pub bm_dis: [Bit_model; 115usize],
pub bm_align: [Bit_model; 16usize],
pub match_len_model: Len_model,
pub rep_len_model: Len_model,
pub renc: Range_encoder,
pub reps: [::std::os::raw::c_int; 4usize],
pub state: State,
pub member_finished: bool,
}
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_encoder {
pub eb: LZ_encoder_base,
pub cycles: ::std::os::raw::c_int,
pub match_len_limit: ::std::os::raw::c_int,
pub match_len_prices: Len_prices,
pub rep_len_prices: Len_prices,
pub pending_num_pairs: ::std::os::raw::c_int,
pub pairs: [Pair; 274usize],
pub trials: [Trial; 8192usize],
pub dis_slot_prices: [[::std::os::raw::c_int; 58usize]; 4usize],
pub dis_prices: [[::std::os::raw::c_int; 128usize]; 4usize],
pub align_prices: [::std::os::raw::c_int; 16usize],
pub num_dis_slots: ::std::os::raw::c_int,
pub price_counter: ::std::os::raw::c_int,
pub dis_price_counter: ::std::os::raw::c_int,
pub align_price_counter: ::std::os::raw::c_int,
pub been_flushed: bool,
}
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_Encoder {
pub partial_in_size: ::std::os::raw::c_ulonglong,
pub partial_out_size: ::std::os::raw::c_ulonglong,
pub lz_encoder_base: *mut LZ_encoder_base,
pub lz_encoder: *mut LZ_encoder,
pub flz_encoder: *mut FLZ_encoder,
pub lz_errno: LZ_Errno,
pub fatal: bool,
}
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_Decoder {
pub partial_in_size: ::std::os::raw::c_ulonglong,
pub partial_out_size: ::std::os::raw::c_ulonglong,
pub rdec: *mut Range_decoder,
pub lz_decoder: *mut LZ_decoder,
pub lz_errno: LZ_Errno,
pub member_header: Lzip_header,
pub fatal: bool,
pub first_header: bool,
pub seeking: bool,
}
Any ideas?
The Lz_Encoder (with uppercase E) is almost identical to the one ive shown, you can use it as a starting point to your application and test calling functions that use this struct, will require lots of running and testing, again, you'll need a project that compiles.
@marcospb19 Ok so both of the Lz_Encoder and Lz_Decoder structs look as follows:
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_Encoder {
pub partial_in_size: c_ulonglong,
pub partial_out_size: c_ulonglong,
pub lz_encoder_base: *mut LZ_encoder_base,
pub lz_encoder: *mut LZ_Encoder,
pub flz_encoder: *mut FLZ_encoder,
pub lz_errno: LzErrno,
pub fatal: bool,
}
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct LZ_Decoder {
pub partial_in_size: c_ulonglong,
pub partial_out_size: c_ulonglong,
pub rdec: *mut Range_decoder,
pub lz_decoder: *mut LZ_Decoder,
pub lz_errno: LzErrno,
pub member_header: Lzip_header,
pub fatal: bool,
pub first_header: bool,
pub seeking: bool,
}
I have no idea how to get Lz_Encoder_Base working though, would using _unused for it as Lz_Encoder got taken care of?
Have you tried removing the #[test]?
For the inner types, you'd have to figure out where each of them come from, searching by the definitions.
@marcospb19 Ok so I've generated bindings for the entirety of lzlib.c using bindgen.
https://dpaste.org/JD3WL
So all we have to do now is grab these big structs, and find where the definitions are and place them in the final file.
I see two definitions of pub struct LZ_Encoder { and I'm not sure why.
Ok I've implemented all of the missing stuff, now it's time to get it to compile...
@marcospb19 I hit a wall again, any ideas on how to get something close to xz2-rs stream.rs or bzip2-rs mem.rs something that implements streams for lzip or uses the decoder/encoder structs directly?
From lzlib's manual:
The functions and variables forming the interface of the compression library are declared in the file 'lzlib.h'. Usage examples of the library are given in the files 'bbexample.c', 'ffexample.c', and 'minilzip.c' from the source distribution.
Perhaps if we looked at the code in bbexample.c, ffexample.c and minilzip.c we'll understand how to implement such file?
@marcospb19 any ideas?
sorry, I have no idea, have you got a version of the project already compiling?
The crate is compiling fine now, I've pushed a new release with some fixes here and there, but I still have no idea how to implement the rest of the files.
Any updates to this?
Sadly no updates, https://crates.io/crates/lzip is empty still.