pdf2image icon indicating copy to clipboard operation
pdf2image copied to clipboard

Unable to get page count on Windows 11 with poppler 23.10.0 and pdf2image 1.17.0

Open ZKHMao opened this issue 8 months ago • 2 comments

Description

I encountered an error "Unable to get page count." when using the pdf2image library to process PDF files. Here are the detailed circumstances:

Environment Information

  • Operating System: Windows 11
  • Python Version: 3.13.2
  • pdf2image Version: 1.17.0
  • poppler Version: Release-23.10.0-0

Steps to Reproduce

  1. Install Python, pdf2image, and poppler with the above - mentioned versions, and add the bin directory of poppler to the system's PATH environment variable.
  2. Run the following Python code:
from pdf2image import convert_from_path

pdf_path = 'path/to/your/pdf/file.pdf'
images = convert_from_path(pdf_path)
  1. When the convert_from_path function is executed, the error "Unable to get page count." will be thrown.

Expected Result

The code should be able to read the PDF file normally and convert it into a list of images.

Actual Result

The code throws an error "Unable to get page count." and cannot continue the PDF conversion operation.

Error Message

Traceback (most recent call last):
  File "H:\Anaconda_envs\envs\LLM\Lib\site-packages\pdf2image\pdf2image.py", line 256, in _page_count
    return int(re.search(r'Pages:\s+(\d+)', out.decode("utf8", "ignore")).group(1))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "H:\4.IDESpace\LLM_S\pdfs\pdf2ImageTest.py", line 10, in <module>
    images = convert_from_path(r'H:\4.IDESpace\LLM_S\pdfs\testPDF.pdf')
  File "H:\Anaconda_envs\envs\LLM\Lib\site-packages\pdf2image\pdf2image.py", line 55, in convert_from_path
    page_count = _page_count(pdf_path, userpw, poppler_path=poppler_path)
  File "H:\Anaconda_envs\envs\LLM\Lib\site-packages\pdf2image\pdf2image.py", line 258, in _page_count
    raise PDFPageCountError('Unable to get page count. %s' % err.decode("utf8", "ignore"))
pdf2image.exceptions.PDFPageCountError: Unable to get page count. 

Additional Information

I've tried multiple different PDF files, and this error occurs every time. Meanwhile, I've confirmed that the installation path of poppler has been correctly configured in the system environment variables.

I hope to get help to solve this problem. Thank you!

ZKHMao avatar May 05 '25 14:05 ZKHMao

Hi. I have tried simulating this issue by trying out the library on:

  1. Fake pdf: echo "Some random text" > broken.pdf
  2. Blank pdf: touch blank.pdf
  3. Corrupted pdf: head -c 100 original.pdf > corrupted.pdf
  4. Non-existent pdf/wrong path

Observations:

  • None of these could be opened using the open tool on MacOS

Then I ran this code on all three pdfs:

>>> from pdf2image import convert_from_path
>>> images = convert_from_path("<pdf_name>.pdf")

Results:

  1. broken.pdf
    page_count = pdfinfo_from_path(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 611, in pdfinfo_from_path
    raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table
  1. blank.pdf
    page_count = pdfinfo_from_path(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 611, in pdfinfo_from_path
    raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Syntax Error: Document stream is empty
  1. corrupted.pdf
page_count = pdfinfo_from_path(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 611, in pdfinfo_from_path
    raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Syntax Error: Unterminated string
Syntax Error: End of file inside dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table
  1. Non existent pdf
page_count = pdfinfo_from_path(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 611, in pdfinfo_from_path
    raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file 'corrputed.pdf': No such file or directory.

So the Unable to get page count error that you are mentioning could really be any of these or possibly some other scenarios. Would it be possible to attach the actual pdfs @ZKHMao ?

brownsloth avatar Jun 19 '25 12:06 brownsloth

got the same issue, no matter what pdf i fed it complains pdf2image.exceptions.PDFPageCountError: Unable to get page count

poppler is installed correctly and out is always empty in this case:

out, err = proc.communicate(timeout=timeout)

pot-code avatar Aug 26 '25 01:08 pot-code