pylzma Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception

Open ijacquez opened this issue 8 years ago • 9 comments

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception:

for name in self.archive.getnames():
    out_filename = os.path.join(path, name)
    out_dir = os.path.dirname(out_filename)
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
        with open(out_filename, 'wb') as out_file:
            out_file.write(self.archive.getmember(name).read())

Oct 17 '15 23:10 ijacquez

I managed to read 7z files in chunks.

@fancycode if you have any interest in this let me know, I can wrap it up in a method and do a pull request. It could potentially solve this issue.

Feb 12 '16 15:02 victor3rc

@victor3rc, what were the results with files exceeding 4GiB?

Feb 12 '16 17:02 ijacquez

@victor3rc sure, pull requests are always welcome!

Feb 13 '16 11:02 fancycode

@victor3rc I'm highly interesting by that code which read 7z files in chunks. It makes little sense for the ArchiveFile class to have a single read method which reads the whole file in memory.

Feb 15 '16 19:02 remyroy

@remyroy I agree.

@ijacquez I've been doing some tests with a 50+ GB file and it is reading it in chunks fine.

I'll try to wrap it in a method this week guys.

Feb 16 '16 09:02 victor3rc

Hey @remyroy @ijacquez, just an update: I managed to read chunks but I was getting some errors when I was calling pylzma.decompressobj.decompress(chunk), specifically at the end of the file, on the final chunks.

A temporary solution I have found to the problem is to use subprocess to call 7z and decompress the file locally. I then read whatever is decompressed in chunks.

Feb 17 '16 15:02 victor3rc

Do you have an idea as to what is causing that? Is it your changes? Are the chunks too big?

Feb 21 '16 22:02 ijacquez

no idea, sorry. I didn't have time to look into the pylzma.decompressobj.decompress functionality, that's where the error was happening. It wouldn't be the size of the chunks, that method is used to read the entire file.

Feb 22 '16 11:02 victor3rc

@victor3rc , could you post your chunk reading code? even if it didn't fully work?

Jun 05 '19 03:06 igkins

pylzma pylzma copied to clipboard

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception

pylzma
pylzma copied to clipboard