content-extractor icon indicating copy to clipboard operation
content-extractor copied to clipboard

IndexError: list index out of range

Open aristotll opened this issue 9 years ago • 1 comments

I encounter this problem, though the examples are processed successfully. File "general.py", line 12, in json = main.run("Programming with PDFMiner.pdf", "./images/") File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\main.py", line 82, in run dict_book = text_to_dict(pdf_file) File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\main.py", line 13, in text_to_dict b = book(xml) File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\book.py", line 20, in init self._pages.append(page(p)) File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\page.py", line 17, in init self._paragraphs.append(paragraph(p)) File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\paragraph.py", line 39, in init self._lines.append(line(l)) File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\line.py", line 40, in init self._chars.append(char(c)) File "D:\codegit\python2.7\pdfminer\content-extractor-master\pdfreader\lib\char.py", line 30, in init self._font = xml_char.get('font').split('+')[1] if xml_char.get('font') != None else None IndexError: list index out of range

aristotll avatar Jan 05 '16 05:01 aristotll

I fixed this by a manual check of the font name here: https://github.com/Micka33/content-extractor/pull/5/commits/a405cfb85717d333eecb0b190fe7fb9c04670449

ymollard avatar Jan 25 '18 14:01 ymollard