pypdf
pypdf copied to clipboard
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
I'm trying to extract text from a pdf together with the position of the text. When I do it in pypdf 3.16 I get the expected result, but I don't...
## Explanation Hello, I am exploring how to populate a pdf form using pypdf. The pdf form I am working on is the following one: https://www.uspto.gov/sites/default/files/patents/process/file/efs/guidance/updated_IDS.pdf It is used for...
I am trying to parse [this PDF](https://www.joinville.sc.gov.br/wp-content/uploads/2023/11/Pesquisa-de-Precos-Combustiveis-novembro-2023.pdf). However, I am getting on the output of extract_text() a bunch of spaces that are not in the original PDF. See the screenshot...
I get garbled characters when parsing pdf file. The file I use is [this](http://www.aas.net.cn/fileZDHXB/journal/article/zdhxb/2012/8/PDF/20120812.pdf). There may be encoding issues? ## Environment ```bash $ python -m platform Linux-4.18.0-147.5.1.6.h841.eulerosv2r9.x86_64-x86_64-with-glibc2.17 $ python -c...
proposal to complete #2203
provides the same interface to access root,info,id for communalisation The objective is prepare some code factorization between PdfWriter / PdfReader
I am trying to extract images from pdf files, however occasionally it gives 'not enough image data' exception from PIL when handling certain pdf. The files look correct in Atril...
I am trying to use PdfReader and PdfWriter to read/write annotations in pdf file. I use PDF file produced by Microsoft Word -> Save As PDF. Word file has 3...