pdfreader
pdfreader copied to clipboard
Python API for PDF documents
========= pdfreader
:Info: See the tutorials & documentation <https://pdfreader.readthedocs.io>
_ for more information.
:Author & Maintainer: Maksym Polshcha [email protected]
See GitHub <https://github.com/maxpmaxp/pdfreader>
_ for the latest source.
About
pdfreader is a Pythonic API for: * extracting texts, images and other data from PDF documents (plain or protected) * accessing different objects within PDF documents
pdfreader is NOT a tool (maybe one day it become!): * to create or update PDF files * to split PDF files into pages or other pieces * convert PDFs to any other format
Nevertheless it can be used as a part of such tools.
See Tutorials & Documentation <https://pdfreader.readthedocs.io>
_.
Features
- Extracts texts (plain text and formatted text objects)
- Extract PDF forms data (pure strings and formatted text objects)
- Supports all PDF encodings, CMap, predefined cmaps.
- Extracts images and image masks as
Pillow/PIL Images <https://pillow.readthedocs.io/en/stable/reference/Image.html>
_ - Supports encrypted and password-protected PDF documents
- Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
- Follows
PDF-1.7 specification <https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf>
_ - Lazy objects access allows to process huge PDF documents quite fast
Installation
pdfreader can be installed with pip <http://pypi.python.org/pypi/pip>
_::
$ python -m pip install pdfreader
Or easy_install
from
setuptools <http://pypi.python.org/pypi/setuptools>
_::
$ python -m easy_install pdfreader
You can also download the project source and do::
$ python setup.py install
Tutorial and Documentation
Tutorial, real-life examples and documentation <https://pdfreader.readthedocs.io>
_
Support, Bugs & Feature Requests
pdfreader uses GitHub issues <https://github.com/maxpmaxp/pdfreader/issues>
_ to keep track of bugs,
feature requests, etc.
Related Projects
-
pdfminer <https://github.com/euske/pdfminer>
_ -
pyPdf2 <https://github.com/py-pdf/PyPDF2>
_ -
xpdf <http://www.foolabs.com/xpdf/>
_ -
pdfbox <http://pdfbox.apache.org/>
_ -
mupdf <http://mupdf.com/>
_
References
-
Document management - Potable document format - PDF 1.7 <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>
_ -
Adobe CMap and CIDFont Files Specification <https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf>
_ -
PostScript Language Reference Manual <https://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf>
_ -
Adobe CMap resources <https://github.com/adobe-type-tools/cmap-resources>
_ -
Adobe glyph list specification (AGL) <https://github.com/adobe-type-tools/agl-specification>
_
Donation
If this project is helpful, you can treat me to coffee :-)
.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=VMVFZSDHDFVK6&item_name=PDFReader+support¤cy_code=USD&source=url