PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

Memory leak when opening an invalid PDF with no %%EOF in tail

Open cmyers009 opened this issue 1 year ago • 4 comments

Description of the bug

If the file bytes are prematurely cut-off, then fitz will open the PDF file with 0 pages, but at the same time, cause a memory leak.

How to reproduce the bug

You can reproduce this bug by taking a large PDF file, and remove the last 50% of the bytes.

If you repeatably load files like this, there will be a memory leak even with a doc.close()

You can add a check if the file has an %%EOF with this code. If you call it before the doc.open() code, then you can return 0 pages without the need to produce the memory leak.

def has_eof_marker(file_path): try: with open(file_path, 'rb') as file: # Seek to the last 1KB of the file file.seek(-1024, os.SEEK_END) # Read the last 1KB tail = file.read() # Check if %%EOF is in the last 1KB return b'%%EOF' in tail except Exception as e: print(f"Error reading file: {e}") return False

    PyMuPDF 1.23.4

PyMuPDF version

1.23.8 or earlier

Operating system

Windows

Python version

3.11

cmyers009 avatar Apr 04 '24 18:04 cmyers009

I cannot reproduce this with the current version, PyMuPDF-1.24.1. What version of PyMuPDF are you using?

I am using 1.23.4

On Thu, Apr 4, 2024 at 3:46 PM Julian Smith @.***> wrote:

I cannot reproduce this with the current version, PyMuPDF-1.24.1. What version of PyMuPDF are you using?

— Reply to this email directly, view it on GitHub https://github.com/pymupdf/PyMuPDF/issues/3344#issuecomment-2038184531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVNAD45N7NHNOAWKVSIQJA3Y3W3ZXAVCNFSM6AAAAABFX2BDOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZYGE4DINJTGE . You are receiving this because you authored the thread.Message ID: @.***>

cmyers009 avatar Apr 04 '24 22:04 cmyers009

There have been quite a few improvements to memory handling since 1.23.4 and so it would be worth retrying with the latest version, 1.24.1.

Ok, I'll check it out.

On Fri, Apr 5, 2024 at 2:22 AM Julian Smith @.***> wrote:

There have been quite a few improvements to memory handling since 1.23.4 and so it would be worth retrying with the latest version, 1.24.1.

— Reply to this email directly, view it on GitHub https://github.com/pymupdf/PyMuPDF/issues/3344#issuecomment-2039128639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVNAD4YE2NR3ZYKPSYNQP53Y3ZGLXAVCNFSM6AAAAABFX2BDOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZZGEZDQNRTHE . You are receiving this because you authored the thread.Message ID: @.***>

cmyers009 avatar Apr 05 '24 18:04 cmyers009

Closing this because waiting for information for over a month.