pdfminer.six icon indicating copy to clipboard operation
pdfminer.six copied to clipboard

Community maintained fork of pdfminer - we fathom PDF

Results 302 pdfminer.six issues
Sort by recently updated
recently updated
newest added

Hi, I am trying to extract several text blocks (using pdfquery https://github.com/jcushman/pdfquery but it's mostly dependant of pdfminer backend). Most of the extractions work well but sometimes the first character...

type: bug
component:document
status: needs more info

the current version of encodingdb.name2unicode(name: str) -> str can't handle type1 font diff like: 2, /'MT110', /'MT50',... It'll decode the diff as cid3, cid 4., ... Compared with a previous...

type: bug
component:characters
status: needs more info

Hello Guys, I recently integrated camelot to convert my pdf files to dataframes, with a fastapi upload process. Currently the processing time is taking 3mins per file after digging deeper...

type:performance
status: needs more info

Order of th text is mixed up and finding them in wrong places: **I'm using the following code:** ``` output_string = StringIO() with open('/Users/udayallu/similarity_search_training/Pol_ProcHdbk1_23.pdf', 'rb') as in_file: parser = PDFParser(in_file)...

component: converter
status: needs more info

## File for reproducing the bug [2.pdf](https://github.com/pdfminer/pdfminer.six/files/5399532/2.pdf) ## Description When running the following code from the [official documentation](https://pdfminersix.readthedocs.io/en/latest/tutorial/extract_pages.html) on the linked file : ```python from pdfminer.high_level import extract_pages from pdfminer.layout...

type: bug
component: converter
status: needs more info

**Bug report** I'm seeing a crash in the latest release of pdfminer.six (20200726) with certain PDF files. Unfortunately for privacy reasons I can't share these. The crash is caused because...

type:anomaly
status: needs more info

**Bug report** Environemnt: window64--Python 3.6 + Spyder 3.2.8 + pdfminer.six-20200726 ======code============ ```python import pdfminer from pdfminer.layout import LAParams from pdfminer.converter import PDFPageAggregator from pdfminer.pdfpage import PDFPage from pdfminer.layout import LTTextBoxHorizontal...

type:anomaly
status: needs more info

- A description of the bug once you install pdfminer.six in anaconda, you cannot run pdf2txt.py - Steps to reproduce the bug. Try to minimize the number of steps needed....

type: question
status: needs more info

**Bug report** _When loading a pdf file:_ The **Type** key is not in the **stream** dictionnary, which raise a KeyError. The pdf file I used is [here](https://www.ema.europa.eu/en/documents/product-information/cerdelga-epar-product-information_fr.pdf) Environment: macOS11.0.1 --Python...

type:anomaly
status: needs solution

This problem occurs when there are **ATTACHMENTS** present within a pdf file. I have provided a sample file in the below link: [attachment_test.pdf](https://github.com/pdfminer/pdfminer.six/files/5507157/attachment_test.pdf) Screenshot of an example file: ![image](https://user-images.githubusercontent.com/35597446/98484003-816c4e00-2232-11eb-9221-fb8ff64e76a2.png) _Originally...

type:anomaly
component:parser
status: needs solution