Unable to get page count on Windows 11 with poppler 23.10.0 and pdf2image 1.17.0
Description
I encountered an error "Unable to get page count." when using the pdf2image library to process PDF files. Here are the detailed circumstances:
Environment Information
- Operating System: Windows 11
- Python Version: 3.13.2
pdf2imageVersion: 1.17.0popplerVersion: Release-23.10.0-0
Steps to Reproduce
- Install Python,
pdf2image, andpopplerwith the above - mentioned versions, and add thebindirectory ofpopplerto the system'sPATHenvironment variable. - Run the following Python code:
from pdf2image import convert_from_path
pdf_path = 'path/to/your/pdf/file.pdf'
images = convert_from_path(pdf_path)
- When the
convert_from_pathfunction is executed, the error "Unable to get page count." will be thrown.
Expected Result
The code should be able to read the PDF file normally and convert it into a list of images.
Actual Result
The code throws an error "Unable to get page count." and cannot continue the PDF conversion operation.
Error Message
Traceback (most recent call last):
File "H:\Anaconda_envs\envs\LLM\Lib\site-packages\pdf2image\pdf2image.py", line 256, in _page_count
return int(re.search(r'Pages:\s+(\d+)', out.decode("utf8", "ignore")).group(1))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "H:\4.IDESpace\LLM_S\pdfs\pdf2ImageTest.py", line 10, in <module>
images = convert_from_path(r'H:\4.IDESpace\LLM_S\pdfs\testPDF.pdf')
File "H:\Anaconda_envs\envs\LLM\Lib\site-packages\pdf2image\pdf2image.py", line 55, in convert_from_path
page_count = _page_count(pdf_path, userpw, poppler_path=poppler_path)
File "H:\Anaconda_envs\envs\LLM\Lib\site-packages\pdf2image\pdf2image.py", line 258, in _page_count
raise PDFPageCountError('Unable to get page count. %s' % err.decode("utf8", "ignore"))
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Additional Information
I've tried multiple different PDF files, and this error occurs every time. Meanwhile, I've confirmed that the installation path of poppler has been correctly configured in the system environment variables.
I hope to get help to solve this problem. Thank you!
Hi. I have tried simulating this issue by trying out the library on:
- Fake pdf:
echo "Some random text" > broken.pdf - Blank pdf:
touch blank.pdf - Corrupted pdf:
head -c 100 original.pdf > corrupted.pdf - Non-existent pdf/wrong path
Observations:
- None of these could be opened using the
opentool on MacOS
Then I ran this code on all three pdfs:
>>> from pdf2image import convert_from_path
>>> images = convert_from_path("<pdf_name>.pdf")
Results:
- broken.pdf
page_count = pdfinfo_from_path(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 611, in pdfinfo_from_path
raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table
- blank.pdf
page_count = pdfinfo_from_path(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 611, in pdfinfo_from_path
raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Syntax Error: Document stream is empty
- corrupted.pdf
page_count = pdfinfo_from_path(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 611, in pdfinfo_from_path
raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Syntax Error: Unterminated string
Syntax Error: End of file inside dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table
- Non existent pdf
page_count = pdfinfo_from_path(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 611, in pdfinfo_from_path
raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file 'corrputed.pdf': No such file or directory.
So the Unable to get page count error that you are mentioning could really be any of these or possibly some other scenarios. Would it be possible to attach the actual pdfs @ZKHMao ?
got the same issue, no matter what pdf i fed it complains pdf2image.exceptions.PDFPageCountError: Unable to get page count
poppler is installed correctly and out is always empty in this case:
out, err = proc.communicate(timeout=timeout)