pdfminer.six
pdfminer.six copied to clipboard
TypeError: 'PDFObjRef' object is not iterable
after updating to version 20240706 extract_text()
on a pdf throws an error TypeError: 'PDFObjRef' object is not iterable
this did not occur on the previous version 20231228
Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun 6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> from pdfminer.high_level import extract_text
>>> text = extract_text("Working.pdf")
>>> text = extract_text("Error.pdf")
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
text = extract_text(path)
File "C:\Program Files\Python312\Lib\site-packages\pdfminer\high_level.py", line 169, in extract_text
for page in PDFPage.get_pages(
File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 171, in get_pages
for (pageno, page) in enumerate(cls.create_pages(doc)):
File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 127, in create_pages
yield cls(document, objid, tree, next(page_labels))
File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 63, in __init__
mediabox_params: List[Any] = [
TypeError: 'PDFObjRef' object is not iterable
>>>
Working.pdf - newly created blank page with acrobat
Error.pdf - downloaded, I cannot change the process of its creation. I deleted all visible text on the page which did not appear to affect the behaviour of the error