PyMuPDF
PyMuPDF copied to clipboard
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
### Description of the bug We are trying to detect any type of widget and delete it. When we perform the operation `page.delete_widget(w)` it throws AttributeError: `'Annot' object has no...
### Description of the bug I have a script that takes a PDF document URL, iterates through all the pages, generates a pixmap for each page, and uses it to...
### Description of the bug Hello, First of all, thank you for all the work you've been putting into this project. Last November, I reported a minor memory leak issue...
### Description of the bug Text removal from pdf with PyMuPDF works good in python3.8 and PyMuPDF 1.20.2: Python bindings for the MuPDF 1.20.3 library but if used python3.12 and...
### Description of the bug `get_text()` extracts numbers in the Cash Flow table in this document as hexadecimal characters. Copy/paste from the page and `pdftotext` extract the correct text. ###...
Relates to issue #3163. This updates updates the area of the rectangle based on the floating point. Instead of simply typecasting from float to int, this function now floors the...
### Description of the bug Hello, I'm using the following function to convert PDF files into PNG: ``` pix = page.get_pixmap(matrix=fitz.Matrix(40, 40),alpha=False,colorspace=fitz.csGRAY) pix.save(image_path, "png") ``` The conversion occurs as expected...
Info 1:  Info 2:  Info 3:  Using Vscode with Type Checking even set to "basic", still receive an alert about not finding the `Page` base for use...
### Feature request Can OCR using Tesseract add a user-settable parameters for page segmentation mode (psm)? This would be very useful because when source documents are forms, OCR recognizes the...
**Is your feature request related to a problem? Please describe.** Python type hints (which can be validated using mypy or Pylance in vscode) are very helpful; fitz_new has much better...