I encounter this problem, though the examples are processed successfully.
File "general.py", line 12, in
json = main.run("Programming with PDFMiner.pdf", "./images/")
File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\main.py", line 82, in run
dict_book = text_to_dict(pdf_file)
File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\main.py", line 13, in text_to_dict
b = book(xml)
File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\book.py", line 20, in init
self._pages.append(page(p))
File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\page.py", line 17, in init
self._paragraphs.append(paragraph(p))
File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\paragraph.py", line 39, in init
self._lines.append(line(l))
File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\line.py", line 40, in init
self._chars.append(char(c))
File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\char.py", line 30, in init
self._font = xml_char.get('font').split('+')[1] if xml_char.get('font') != None else None
IndexError: list index out of range
I fixed this by a manual check of the font name here: https://github.com/Micka33/content-extractor/pull/5/commits/a405cfb85717d333eecb0b190fe7fb9c04670449