pdfminer.six
pdfminer.six copied to clipboard
Community maintained fork of pdfminer - we fathom PDF
Hi, I am trying to extract several text blocks (using pdfquery https://github.com/jcushman/pdfquery but it's mostly dependant of pdfminer backend). Most of the extractions work well but sometimes the first character...
the current version of encodingdb.name2unicode(name: str) -> str can't handle type1 font diff like: 2, /'MT110', /'MT50',... It'll decode the diff as cid3, cid 4., ... Compared with a previous...
Hello Guys, I recently integrated camelot to convert my pdf files to dataframes, with a fastapi upload process. Currently the processing time is taking 3mins per file after digging deeper...
Order of th text is mixed up and finding them in wrong places: **I'm using the following code:** ``` output_string = StringIO() with open('/Users/udayallu/similarity_search_training/Pol_ProcHdbk1_23.pdf', 'rb') as in_file: parser = PDFParser(in_file)...
## File for reproducing the bug [2.pdf](https://github.com/pdfminer/pdfminer.six/files/5399532/2.pdf) ## Description When running the following code from the [official documentation](https://pdfminersix.readthedocs.io/en/latest/tutorial/extract_pages.html) on the linked file : ```python from pdfminer.high_level import extract_pages from pdfminer.layout...
**Bug report** I'm seeing a crash in the latest release of pdfminer.six (20200726) with certain PDF files. Unfortunately for privacy reasons I can't share these. The crash is caused because...
**Bug report** Environemnt: window64--Python 3.6 + Spyder 3.2.8 + pdfminer.six-20200726 ======code============ ```python import pdfminer from pdfminer.layout import LAParams from pdfminer.converter import PDFPageAggregator from pdfminer.pdfpage import PDFPage from pdfminer.layout import LTTextBoxHorizontal...
- A description of the bug once you install pdfminer.six in anaconda, you cannot run pdf2txt.py - Steps to reproduce the bug. Try to minimize the number of steps needed....
**Bug report** _When loading a pdf file:_ The **Type** key is not in the **stream** dictionnary, which raise a KeyError. The pdf file I used is [here](https://www.ema.europa.eu/en/documents/product-information/cerdelga-epar-product-information_fr.pdf) Environment: macOS11.0.1 --Python...
This problem occurs when there are **ATTACHMENTS** present within a pdf file. I have provided a sample file in the below link: [attachment_test.pdf](https://github.com/pdfminer/pdfminer.six/files/5507157/attachment_test.pdf) Screenshot of an example file:  _Originally...