pdfminer.six icon indicating copy to clipboard operation
pdfminer.six copied to clipboard

Community maintained fork of pdfminer - we fathom PDF

Results 302 pdfminer.six issues
Sort by recently updated
recently updated
newest added

**Bug report** **- A description of the bug** Bad HTML markup generated while using `pdf2txt.py test.pdf -t html -o test.html` **- Steps to reproduce the bug.** 1. Use the following...

type: bug
component: converter
status: needs solution

In a call to `get_pages`, this PDF raised an exception. pdfminer version: refs/tags/20201018 PDF: https://source.android.com/compatibility/5.1/android-5.1-cdd.pdf My code looks like this: ```python raw_input = io.BytesIO(content) # The file contents html_output =...

type:anomaly
status: needs solution

Hi, I am not able to find any combination of LAParams to correctly convert attached simple PDF to text. In the resulting text lines do not have correct sequence: Expected...

type: question
status: needs solution

Hi, I've got this PDF (see attachment) which opens just fine in a PDF viewer but fails to get parsed: ``` PDFSyntaxError Traceback (most recent call last) in () 7...

type:anomaly
status: needs solution

I have some PDF documents that look like: ``` %PDF %objects xref %table trailer

type:anomaly
status: needs solution

**Bug report** Copy of #471 (by @imochoa) Sadly, I cannot upload the problematic PDFs due a non-disclosure agreement. I can however point out the issue and share my fix. When...

type: bug
status: needs solution

### Description In my PDF I have some math formula. I encounter no problem when reading the file with `pdfminer`, but the position of the math is wrong. Because of...

type: bug
component: converter
status: needs solution

hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR).

type: new feature
status: needs solution

Currently, we have a couple of pdf's as test case. However, a lot of bug reports come with problematic pdf's. It would be great if we could add regression tests...

type: development
status: needs solution

The current distance function computes the area between two textboxes. This can prioritize the grouping of textboxes A and B, while C is in between A and B. This is...

type: new feature
status: needs solution