sevenz-rust
sevenz-rust copied to clipboard
Decompressing 7z+ZSTD is missing entries
Context:
- I compress a folder using py7zr >= 0.21.0:
ZSTD_FILTER = [{"id": FILTER_ZSTD, "level": ZSTD_COMPRESSION_LEVEL}]
with SevenZipFile(<path-to-7z>, mode="w", filters=ZSTD_FILTER) as zst_handle:
for root, dirs, files in os.walk(<input-path>):
for node in files + dirs:
zst_handle.write(os.path.join(root, node), os.path.relpath(os.path.join(root, node), <input-path>))
- Using
sevenz-rust = { version = "0.6.1", features = ["zstd"] }
I then try to decompress this file using sevenz-rustsevenz_rust::decompress_file(7z_file, &args.dest)
- This results in a 'successful' extraction, but it is actually missing a series of files in the directories
- This archive can be extracted without issue using the 7z utility so it appears to be well formatted.
Debugging:
- Debugging this with a local version of sevenz-rust shows that this loop in
reader.rs
is not iterating all of the files - Changing this line to
for file_index in start..(archive.files.len() + start)
fixed this in my case - In this archive
self.archive.folders.len() = 1
so we only pokefolder_dec.for_each_entries
once - However, this does not iterate all the files because the file_count (computed by
archive.folders[folder_index].num_unpack_sub_streams
) appears to be too low:
file_count=204
archive.files.len() = 233
- It looks like
file_count
is read from the archivenumStreams
header on this line and is indeed 204 in this case not 233
Questions / thoughts:
- Is it reliable to rely on the numStreams header for determining the loop iterations?
- Is there a better way to determine the loop iterations?
Any help / advice would be much appreciated