pdfminer issues

Merge fixes from pdfminer.six

Hi! Since python 2 is dead and both projects now have the same goal, do they still need to be independent? see #210 and #243

Recursing

XMLConverter should not hard code `encoding="utf-8"`

1

Code: ``` from pdfminer.converter import XMLConverter rsrcmgr = PDFResourceManager() xmlstream = StringIO() device = XMLConverter(rsrcmgr, xmlstream) # xmlstream now contains something like: # '' ``` While that encoding is pretty...

SleepyJay

Bug in PDF parsing when the page has two columns of text

4

First of all, thanks for this great tool for parsing PDFs. I am facing issues when extracting text from two column text pages in PDF (research paper). In such cases,...

ParikhKadam

pdfminer Unable to install on Ubuntu 18

3

`pip install pdfminer` Error Running setup.py install for pdfminer ... error Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-eyesmn/pdfminer/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-MuNlNG-record/install-record.txt --single-version-externally-managed --compile" failed with...

pcakhilnadh

AssertionError in drange when process a page

Hi. I get an error when process page in some PDF files. Code: ``` fp = open(filename, 'rb') # Create a PDF parser object associated with the file object. parser...

jserrano-rebold

Fix bug that some types of images are exported incorrectly.

vincent0629

Update index.html for Python 3 support

1

cjmayo

scanned pdf or native pdf

2

Given a pdf file, how to judge whether it is a native pdf or a scanned pdf by using `pdfminer`, any suggestions?

longbowking

how to collect font list from pdf file

`pdffonts` can collect all fonts used in a pdf file, e.g. [Link](https://stackoverflow.com/questions/11820241/) ```bash pdffonts bash-manpage.pdf name type encoding emb sub uni object ID ------------------------------- ------------- --------------- --- --- --- ---------...

longbowking

Fails on start

``` $ git clone [email protected]:euske/pdfminer.git $ cd pdfminer $ python3 ./tools/dumppdf.py Traceback (most recent call last): File "./tools/dumppdf.py", line 17, in from pdfminer.utils import isnumber, q ImportError: cannot import name...

larytet

pdfminer
pdfminer copied to clipboard

Metadata

Merge fixes from pdfminer.six

XMLConverter should not hard code `encoding="utf-8"`

Bug in PDF parsing when the page has two columns of text

pdfminer Unable to install on Ubuntu 18

AssertionError in drange when process a page

Fix bug that some types of images are exported incorrectly.

Update index.html for Python 3 support

scanned pdf or native pdf

how to collect font list from pdf file

Fails on start

← Metadata

Owner

Metadata

pdfminer pdfminer copied to clipboard

Metadata

← Metadata

Owner

Metadata

pdfminer
pdfminer copied to clipboard