pylzma
pylzma copied to clipboard
Data error during extraction
Error when running example:
Traceback (most recent call last):
File "E:\Python\7ztest.py", line 38, in <module>
sevenZfile.extractall('.')
File "E:\Python\7ztest.py", line 33, in extractall
outfile.write(self.archive.getmember(name).read())
File "E:\Python\py7zlib.py", line 632, in read
data = getattr(self, decoder)(coder, data, level, num_coders)
File "E:\Python\py7zlib.py", line 717, in _read_lzma2
return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
File "E:\Python\py7zlib.py", line 688, in _read_from_decompressor
data = decompressor.decompress(input)
ValueError: data error during decompression
Source
import py7zlib
import os
class SevenZFile(object):
@classmethod
def is_7zfile(cls, filepath):
'''
Class method: determine if file path points to a valid 7z archive.
'''
is7z = False
fp = None
try:
fp = open(filepath, 'rb')
archive = py7zlib.Archive7z(fp)
n = len(archive.getnames())
is7z = True
finally:
if fp:
fp.close()
return is7z
def __init__(self, filepath):
fp = open(filepath, 'rb')
self.archive = py7zlib.Archive7z(fp)
def extractall(self, path):
for name in self.archive.getnames():
outfilename = os.path.join(path, name)
outdir = os.path.dirname(outfilename)
if not os.path.exists(outdir):
os.makedirs(outdir)
outfile = open(outfilename, 'wb')
outfile.write(self.archive.getmember(name).read())
outfile.close()
if SevenZFile.is_7zfile('DP_LAN_Realtek-XP_18000.7z'):
sevenZfile = SevenZFile('DP_LAN_Realtek-XP_18000.7z')
sevenZfile.extractall('.')
Could you please provide a (small) sample file that shows the error? Also which version of pylzma are you running?.
Hello fancycode, thank you for responding to the above comment because I am having the same problem as the original poster. Here is my extractall function that I use on 7z archives:
def extractall(path):
with open(item, 'rb') as fp:
archive = py7zlib.Archive7z(fp)
for name in archive.getnames():
outfilename = os.path.join(path, name)
outdir = os.path.dirname(outfilename)
if not os.path.exists(outdir):
os.makedirs(outdir)
with open(outfilename, 'wb') as outfile:
acv = archive.getmember(name)
outfile.write(acv.read())
extractall(path=os,curdir)
I've narrowed down where the data errors come from, most notably from executable files (.exe on windows), which when I extract create the proper executable file name, but will be a zero-byte file. It also appears .dll files will completely fail to extract and return the data error. However, this function works perfectly with a folder full of JPEG wallpapers or XML files, dll and exe files are the only files I noticed to return errors. I really would like to use this function since it's recursive and can do folders/subfolders. If you need anything else to help towards narrowing out this problem just ask and I will see what I can provide.
Master now supports various BCJ filter, could you please check if this solves the issues you were having?
This new addition works, but only up to 128 kilobytes, after it hits that limit it stops reading and creates zero-byte files again. I verified a 70 kilobyte file with Md5Checker and the hash was exact, it just seems there is this data limit imposed consistently, which applies to .dll and executables.
Should be fixed with the latest change, could you please test again?
Works perfectly! A pythonic way to extract 7z archives is amazing. All .dll files and .exe file hashes match. If I uncover anything else I can submit a new error report, thank you!
Great, thanks for reporting & testing!
Incidentally I just had the same problem and this fixed it. Any chance of getting a new release on PyPI with this fix? Thanks!
Thanks @fancycode fir the fix, sorry for MIA catching up on subscriptions
Successfully installed pylzma-v0.5.0-17-gccb0dev
Traceback (most recent call last):
File "C:\System\Users\Biatu\Dev\Python\DriverMgr\dev_drivermgr.py", line 38, in <module>
sevenZfile.extractall('.')
File "C:\System\Users\Biatu\Dev\Python\DriverMgr\dev_drivermgr.py", line 33, in extractall
outfile.write(self.archive.getmember(name).read())
File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 650, in read
data = getattr(self, decoder)(coder, data, level, num_coders)
File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 750, in _read_lzma2
return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 717, in _read_from_decompressor
data = decompressor.decompress(input, self._start+total_decompressed)
ValueError: data error during decompression
Process returned 1 (0x1) execution time : 1.242 s
Press any key to continue . . .
Sources+7z file: https://drive.google.com/file/d/1YUgE15Tt2OS07X6Yzc_afc98UjRsfbQs/view?usp=sharing
So this still happens on master for you :disappointed: Could you please provide a (small) file I can use for testing?
Check my edit, included sources
Thanks, I would need a .7z file that fails to decompress, not the Python source you are using for extracting.
It's in the archive I linked
Oops, sorry that link didn't show up earlier. After an explicit refresh I can now see it. Thanks.
np, and ty :)
Any updates on this?
The first file compressed with only LZMA2 extracts just fine, but the binary with multiple coders-fails

Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/NET8150.INF
{'digest': 824600073, '_start': 25372901, '_src_start': 32, '_folder': {'coders': [{'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}], 'digestdefined': False, 'totalout': 1, 'bindpairs': [], 'packed_indexes': [0], 'unpacksizes': [100331479], 'solid': True}, '_maxsize': 445821, 'emptystream': False, 'filename': 'Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/NET8150.INF', 'attributes': 32, 'compressed': 445821, '_uncompressed': [5726], 'size': 5726, 'uncompressed': 5726, 'pos': 0}
Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/RTL8150.SYS
{'digest': 572653165, '_start': 68116417, '_src_start': 445853, '_folder': {'coders': [{'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}, {'method': b'\x03\x03\x01\x1b', 'numinstreams': 4, 'numoutstreams': 1}], 'digestdefined': False, 'totalout': 4, 'bindpairs': [(5, 0), (4, 1), (3, 2)], 'packed_indexes': [2, 6, 1, 0], 'unpacksizes': [1450568, 5837200, 79220364, 86508132], 'solid': True}, '_maxsize': 16244592, 'emptystream': False, 'filename': 'Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/RTL8150.SYS', 'attributes': 32, 'compressed': 16244592, '_uncompressed': [21504, 21504, 21504, 21504], 'size': 21504, 'uncompressed': 21504, 'pos': 0}
[{'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}, {'method': b'\x03\x03\x01\x1b', 'numinstreams': 4, 'numoutstreams': 1}]Traceback (most recent call last):
File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 653, in read
data = getattr(self, decoder)(coder, data, level, num_coders)
File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 741, in _read_lzma
return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 723, in _read_from_decompressor
data = decompressor.decompress(input, self._start+total_decompressed)
ValueError: data error during decompression
Traceback (most recent call last):
File "...\idx.py", line 74, in <module>
sevenZfile.extractall('.')
File "...\idx.py", line 66, in extractall
md=mn.read()
File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 658, in read
raise e
File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 653, in read
data = getattr(self, decoder)(coder, data, level, num_coders)
File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 741, in _read_lzma
return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 723, in _read_from_decompressor
data = decompressor.decompress(input, self._start+total_decompressed)
Are there any updates on this? This still fails
More info:
I pulled 7zdec.exe from lzma1900.7z and compressed it with 7zFM, Ultra Compression. Ultra completely chokes when decoding this.
