pylzma icon indicating copy to clipboard operation
pylzma copied to clipboard

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception

Open ijacquez opened this issue 8 years ago • 9 comments

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception:

for name in self.archive.getnames():
    out_filename = os.path.join(path, name)
    out_dir = os.path.dirname(out_filename)
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
        with open(out_filename, 'wb') as out_file:
            out_file.write(self.archive.getmember(name).read())

ijacquez avatar Oct 17 '15 23:10 ijacquez

I managed to read 7z files in chunks.

@fancycode if you have any interest in this let me know, I can wrap it up in a method and do a pull request. It could potentially solve this issue.

victor3rc avatar Feb 12 '16 15:02 victor3rc

@victor3rc, what were the results with files exceeding 4GiB?

ijacquez avatar Feb 12 '16 17:02 ijacquez

@victor3rc sure, pull requests are always welcome!

fancycode avatar Feb 13 '16 11:02 fancycode

@victor3rc I'm highly interesting by that code which read 7z files in chunks. It makes little sense for the ArchiveFile class to have a single read method which reads the whole file in memory.

remyroy avatar Feb 15 '16 19:02 remyroy

@remyroy I agree.

@ijacquez I've been doing some tests with a 50+ GB file and it is reading it in chunks fine.

I'll try to wrap it in a method this week guys.

victor3rc avatar Feb 16 '16 09:02 victor3rc

Hey @remyroy @ijacquez, just an update: I managed to read chunks but I was getting some errors when I was calling pylzma.decompressobj.decompress(chunk), specifically at the end of the file, on the final chunks.

A temporary solution I have found to the problem is to use subprocess to call 7z and decompress the file locally. I then read whatever is decompressed in chunks.

victor3rc avatar Feb 17 '16 15:02 victor3rc

Do you have an idea as to what is causing that? Is it your changes? Are the chunks too big?

ijacquez avatar Feb 21 '16 22:02 ijacquez

no idea, sorry. I didn't have time to look into the pylzma.decompressobj.decompress functionality, that's where the error was happening. It wouldn't be the size of the chunks, that method is used to read the entire file.

victor3rc avatar Feb 22 '16 11:02 victor3rc

@victor3rc , could you post your chunk reading code? even if it didn't fully work?

igkins avatar Jun 05 '19 03:06 igkins