pypdf
pypdf copied to clipboard
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Fixes #2401. This change is a bit tricky, as it can be interpreted as a breaking one. We previously did not ensure that the user would use this correctly and...
## Explanation When you have Arabic text mixed with digits, the text extraction order is messed up. Below is an example. 1. Reading from right to left, here's the ground...
## Explanation I'm using PyPdf2 to extract metadata from PDF documents. For my usecase I'm only interested in the documents metadata, not the content itself. ## Code Example ```python from...
I would like to programmatically add alternative texts for screen readers to images. Is that possible right now? The way the feature could be used is to identify a node...
I would like to dynamically remove certain annotations from a page but not others. I solved it like this: ```python from pypdf import PageObject, PdfWriter, PdfReader from pypdf.constants import PageAttributes...
## Explanation I found an example for the /JBIG2Decode filter :-) ## Code Example PDF: https://github.com/py-pdf/pypdf/files/12090692/New.Jersey.Coinbase.staking.securities.charges.2023-0606_Coinbase-Penalty-and-C-D.pdf ```python from pypdf import PdfReader, __version__ print(f"pypdf=={__version__}") reader = PdfReader("New.Jersey.Coinbase.staking.securities.charges.2023-0606_Coinbase-Penalty-and-C-D.pdf") page = reader.pages[0] for...
## Explanation According to https://github.com/py-pdf/pypdf/blob/11ee6480a3f795d770da89944f32a977e3c110e2/pypdf/_utils.py#L433-L449 `logger_warning` is advertised to allow overwriting, but doing so proves to be more complicated than expected. There basically are two reasons for this: 1. The...
## Explanation Superscripts are common in math, especially squares (e.g. x²) and cubes (e.g. x³). ## Code Example How would your feature be used? (Remove this if it is not...
When we extract Python code from a PDF, it's completely messed up. It would be nice to have an option that keeps the indentation. Maybe a flag for a layout-mode?...
I am working on getting the "in reply to" working for annotations using pypdf but I think there is something that I am misunderstanding. Here's a basic script I thought...