[bug] doctr 0.5.1 | pypdfium2 --> AttributeError: module 'pypdfium2' has no attribute 'render_pdf_topil'
Bug description
I just installed doctr on OSX (TensofFlow), and I got an issue with pypdfium2 when reading a PDF file.
Code snippet to reproduce the bug
import os
import urllib.request
# Let's pick the desired backend
os.environ['USE_TF'] = '1'
import matplotlib.pyplot as plt
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
# Download a sample
urllib.request.urlretrieve("https://eforms.com/download/2019/01/Cash-Payment-Receipt-Template.pdf", "test.pdf")
# Read the file
doc = DocumentFile.from_pdf("test.pdf")
print(f"Number of pages: {len(doc)}")
Error traceback
Traceback (most recent call last):
File "/Users/bnf/Desktop/docTR/test.py", line 22, in <module>
doc = DocumentFile.from_pdf("test.pdf")
File "/Users/bnf/.virtualenvs/dl4cv/lib/python3.9/site-packages/doctr/io/reader.py", line 37, in from_pdf
return read_pdf(file, **kwargs)
File "/Users/bnf/.virtualenvs/dl4cv/lib/python3.9/site-packages/doctr/io/pdf.py", line 42, in read_pdf
return [np.asarray(img) for img, _ in pdfium.render_pdf_topil(file, scale=scale, **kwargs)]
AttributeError: module 'pypdfium2' has no attribute 'render_pdf_topil'
Environment
DocTR version: 0.5.1 TensorFlow version: 2.9.1 PyTorch version: N/A (torchvision N/A) OpenCV version: 4.6.0 OS: Mac OSX 10.14.6 Python version: 3.9.13 Is CUDA available (TensorFlow): No Is CUDA available (PyTorch): N/A CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA
Deep Learning backend
print(f"is_tf_available: {is_tf_available()}") is_tf_available: True print(f"is_torch_available: {is_torch_available()}") is_torch_available: False
Hi @altomator, could you please downgrade pypdfium2 to version 1.0.0 and test it again ? With the next release we support v2.0.0 :)
@mara004 have you changed anything in pypdfium2 corresponding to this issue in v2.1 ? :sweat_smile:
And i saw you have added __enter__ and __exit__ in 2.1 so would you maybe open an PR to replace closing with with ? :D
Concerning the issue reported by @altomator, the traceback tells this is pypdfium2 version 2 used as if it were version 1, so this is already fixed in main.
I indeed added the context manager API, so PdfDocument can now be used in a with block. However, I guess the change is so simple that a member of doctr team could probably commit it directly. You may also want to consider adding an upper-bound version limit for pypdfium2 (i. e. >=2.1, <3) to prevent similar issues in the future.
solution: for the current doctr (version v0.5.1 available via pip) you need to downgrade pypdfium2 to version 1.0.0 pip3 install pypdfium2==1.0.0 this will be fixed with the next release
solution: for the current pip version v0.5.1 [...]
I guess you meant "for the current doctr version" ;) ?
solution: for the current pip version v0.5.1 [...]
I guess you meant "for the current doctr version" ;) ?
Updated :)
We'll schedule a minor patch release to fix this shortly as I believe it could impact a lot of new users who install pypdfium2 for the first time :+1:
solved after 0.6.0 release
But the release hasn't been made yet :sweat_smile: For the sake of not forgetting issues tracked by a release, I think it's safer to close this afterwards
released