PyMuPDF
PyMuPDF copied to clipboard
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
### Description of the bug [mscbookin.pdf](https://github.com/user-attachments/files/15982045/mscbookin.pdf)   I encountered an issue while processing the file, where the string obtained using the get_text() method was missing some data compared...
### Description of the bug Trying to get the pixmap of certain pdf documents fails: ``` File "/home/.../test.py", line 11, in _ = page.get_pixmap() ^^^^^^^^^^^^^^^^^ File "/home/.../miniconda3/lib/python3.12/site-packages/fitz/utils.py", line 888, in...
### Description of the bug I encountered a case while processing the file, which is a readable PDF. However, there is a significant deviation between the location information obtained by...
### Description of the bug Since v1.24.1 introduced `use_objstms` option in `Document.save()`, setting `use_objstms=1` and `linear=True` together doesn't work on some documents, results in a broken PDF file. On version...
### Discussed in https://github.com/pymupdf/PyMuPDF/discussions/3567 Originally posted by **serhii-brovarnyk** June 11, 2024 Hello! I have a PDF file with only one page I got via another tool for PDF documents, and...
**Is your feature request related to a problem? Please describe.** YES. I'm doing a RAG on an group of brazilian laws and I think that the problem applies to all...
**Is your feature request related to a problem? Please describe.** I have a pdf which contains bookmarks. Each bookmark has a nameddest. After call document.get_toc() I get a list of...
`Pixmap.pil_save()` and `Pixmap.pil_tobytes()` methods convert a `Pixmap` to a `PIL.Image` object internally, but don't expose this object to the user of the library. Having access to the Pillow object directly...
### Description of the bug When you use an unordered list (``) inside an ordered list (``) item, you get a broken count where it will count the items in...
### Description of the bug with some docs with already disabled optional content layers the rendered pages still contain them; example link: https://dropmefiles.com/zTbp4 ### How to reproduce the bug ```python...