pylzma icon indicating copy to clipboard operation
pylzma copied to clipboard

Data error during extraction

Open BiatuAutMiahn opened this issue 6 years ago • 20 comments

Error when running example:

Traceback (most recent call last):
  File "E:\Python\7ztest.py", line 38, in <module>
    sevenZfile.extractall('.')
  File "E:\Python\7ztest.py", line 33, in extractall
    outfile.write(self.archive.getmember(name).read())
  File "E:\Python\py7zlib.py", line 632, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "E:\Python\py7zlib.py", line 717, in _read_lzma2
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "E:\Python\py7zlib.py", line 688, in _read_from_decompressor
    data = decompressor.decompress(input)
ValueError: data error during decompression

Source

import py7zlib
import os

class SevenZFile(object):
    @classmethod
    def is_7zfile(cls, filepath):
        '''
        Class method: determine if file path points to a valid 7z archive.
        '''
        is7z = False
        fp = None
        try:
            fp = open(filepath, 'rb')
            archive = py7zlib.Archive7z(fp)
            n = len(archive.getnames())
            is7z = True
        finally:
            if fp:
                fp.close()
        return is7z

    def __init__(self, filepath):
        fp = open(filepath, 'rb')
        self.archive = py7zlib.Archive7z(fp)

    def extractall(self, path):
        for name in self.archive.getnames():
            outfilename = os.path.join(path, name)
            outdir = os.path.dirname(outfilename)
            if not os.path.exists(outdir):
                os.makedirs(outdir)
            outfile = open(outfilename, 'wb')
            outfile.write(self.archive.getmember(name).read())
            outfile.close()
			
if SevenZFile.is_7zfile('DP_LAN_Realtek-XP_18000.7z'):
    sevenZfile = SevenZFile('DP_LAN_Realtek-XP_18000.7z')
    sevenZfile.extractall('.')

BiatuAutMiahn avatar Jan 01 '19 21:01 BiatuAutMiahn

Could you please provide a (small) sample file that shows the error? Also which version of pylzma are you running?.

fancycode avatar Feb 01 '19 11:02 fancycode

Hello fancycode, thank you for responding to the above comment because I am having the same problem as the original poster. Here is my extractall function that I use on 7z archives:

 def extractall(path):
    with open(item, 'rb') as fp:
      archive = py7zlib.Archive7z(fp)
      for name in archive.getnames():
        outfilename = os.path.join(path, name)
        outdir = os.path.dirname(outfilename)
        if not os.path.exists(outdir):
          os.makedirs(outdir)
        with open(outfilename, 'wb') as outfile:
          acv = archive.getmember(name)
          outfile.write(acv.read())
       

     extractall(path=os,curdir)

I've narrowed down where the data errors come from, most notably from executable files (.exe on windows), which when I extract create the proper executable file name, but will be a zero-byte file. It also appears .dll files will completely fail to extract and return the data error. However, this function works perfectly with a folder full of JPEG wallpapers or XML files, dll and exe files are the only files I noticed to return errors. I really would like to use this function since it's recursive and can do folders/subfolders. If you need anything else to help towards narrowing out this problem just ask and I will see what I can provide.

Jimmy-Jon avatar Mar 22 '19 19:03 Jimmy-Jon

Master now supports various BCJ filter, could you please check if this solves the issues you were having?

fancycode avatar Mar 24 '19 17:03 fancycode

This new addition works, but only up to 128 kilobytes, after it hits that limit it stops reading and creates zero-byte files again. I verified a 70 kilobyte file with Md5Checker and the hash was exact, it just seems there is this data limit imposed consistently, which applies to .dll and executables.

Jimmy-Jon avatar Mar 24 '19 19:03 Jimmy-Jon

Should be fixed with the latest change, could you please test again?

fancycode avatar Mar 24 '19 22:03 fancycode

Works perfectly! A pythonic way to extract 7z archives is amazing. All .dll files and .exe file hashes match. If I uncover anything else I can submit a new error report, thank you!

Jimmy-Jon avatar Mar 24 '19 22:03 Jimmy-Jon

Great, thanks for reporting & testing!

fancycode avatar Mar 24 '19 22:03 fancycode

Incidentally I just had the same problem and this fixed it. Any chance of getting a new release on PyPI with this fix? Thanks!

embray avatar Apr 21 '19 16:04 embray

Thanks @fancycode fir the fix, sorry for MIA catching up on subscriptions

BiatuAutMiahn avatar Jun 14 '19 05:06 BiatuAutMiahn

Successfully installed pylzma-v0.5.0-17-gccb0dev

Traceback (most recent call last):
  File "C:\System\Users\Biatu\Dev\Python\DriverMgr\dev_drivermgr.py", line 38, in <module>
    sevenZfile.extractall('.')
  File "C:\System\Users\Biatu\Dev\Python\DriverMgr\dev_drivermgr.py", line 33, in extractall
    outfile.write(self.archive.getmember(name).read())
  File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 650, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 750, in _read_lzma2
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "C:\ProgramData\Miniconda3\lib\site-packages\py7zlib.py", line 717, in _read_from_decompressor
    data = decompressor.decompress(input, self._start+total_decompressed)
ValueError: data error during decompression

Process returned 1 (0x1)        execution time : 1.242 s
Press any key to continue . . .

Sources+7z file: https://drive.google.com/file/d/1YUgE15Tt2OS07X6Yzc_afc98UjRsfbQs/view?usp=sharing

BiatuAutMiahn avatar Jun 14 '19 07:06 BiatuAutMiahn

So this still happens on master for you :disappointed: Could you please provide a (small) file I can use for testing?

fancycode avatar Jun 14 '19 07:06 fancycode

Check my edit, included sources

BiatuAutMiahn avatar Jun 14 '19 07:06 BiatuAutMiahn

Thanks, I would need a .7z file that fails to decompress, not the Python source you are using for extracting.

fancycode avatar Jun 14 '19 07:06 fancycode

It's in the archive I linked

BiatuAutMiahn avatar Jun 14 '19 07:06 BiatuAutMiahn

Oops, sorry that link didn't show up earlier. After an explicit refresh I can now see it. Thanks.

fancycode avatar Jun 14 '19 07:06 fancycode

np, and ty :)

BiatuAutMiahn avatar Jun 14 '19 07:06 BiatuAutMiahn

Any updates on this?

BiatuAutMiahn avatar Aug 31 '19 21:08 BiatuAutMiahn

The first file compressed with only LZMA2 extracts just fine, but the binary with multiple coders-fails

image

Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/NET8150.INF
{'digest': 824600073, '_start': 25372901, '_src_start': 32, '_folder': {'coders': [{'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}], 'digestdefined': False, 'totalout': 1, 'bindpairs': [], 'packed_indexes': [0], 'unpacksizes': [100331479], 'solid': True}, '_maxsize': 445821, 'emptystream': False, 'filename': 'Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/NET8150.INF', 'attributes': 32, 'compressed': 445821, '_uncompressed': [5726], 'size': 5726, 'uncompressed': 5726, 'pos': 0}


Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/RTL8150.SYS
{'digest': 572653165, '_start': 68116417, '_src_start': 445853, '_folder': {'coders': [{'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}, {'method': b'\x03\x03\x01\x1b', 'numinstreams': 4, 'numoutstreams': 1}], 'digestdefined': False, 'totalout': 4, 'bindpairs': [(5, 0), (4, 1), (3, 2)], 'packed_indexes': [2, 6, 1, 0], 'unpacksizes': [1450568, 5837200, 79220364, 86508132], 'solid': True}, '_maxsize': 16244592, 'emptystream': False, 'filename': 'Realtek/matchver/FORCED/6x86/RTL8150_5.126.0411.2008/RTL8150.SYS', 'attributes': 32, 'compressed': 16244592, '_uncompressed': [21504, 21504, 21504, 21504], 'size': 21504, 'uncompressed': 21504, 'pos': 0}
[{'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'\x03\x01\x01', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'l\x00\x00\x10\x00'}, {'method': b'!', 'numinstreams': 1, 'numoutstreams': 1, 'properties': b'\x1d'}, {'method': b'\x03\x03\x01\x1b', 'numinstreams': 4, 'numoutstreams': 1}]Traceback (most recent call last):
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 653, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 741, in _read_lzma
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 723, in _read_from_decompressor
    data = decompressor.decompress(input, self._start+total_decompressed)
ValueError: data error during decompression

Traceback (most recent call last):
  File "...\idx.py", line 74, in <module>
    sevenZfile.extractall('.')
  File "...\idx.py", line 66, in extractall
    md=mn.read()
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 658, in read
    raise e
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 653, in read
    data = getattr(self, decoder)(coder, data, level, num_coders)
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 741, in _read_lzma
    return self._read_from_decompressor(coder, dec, input, level, num_coders, with_cache=True)
  File "C:\Program Files\Python37\lib\site-packages\py7zlib.py", line 723, in _read_from_decompressor
    data = decompressor.decompress(input, self._start+total_decompressed)

BiatuAutMiahn avatar Dec 25 '19 21:12 BiatuAutMiahn

Are there any updates on this? This still fails

BiatuAutMiahn avatar Apr 12 '20 07:04 BiatuAutMiahn

More info: I pulled 7zdec.exe from lzma1900.7z and compressed it with 7zFM, Ultra Compression. Ultra completely chokes when decoding this. image image

BiatuAutMiahn avatar Apr 12 '20 09:04 BiatuAutMiahn