doctr icon indicating copy to clipboard operation
doctr copied to clipboard

[bug] doctr 0.5.1 | pypdfium2 --> AttributeError: module 'pypdfium2' has no attribute 'render_pdf_topil'

Open altomator opened this issue 3 years ago • 7 comments

Bug description

I just installed doctr on OSX (TensofFlow), and I got an issue with pypdfium2 when reading a PDF file.

Code snippet to reproduce the bug

import os
import urllib.request

# Let's pick the desired backend
os.environ['USE_TF'] = '1'

import matplotlib.pyplot as plt

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

# Download a sample
urllib.request.urlretrieve("https://eforms.com/download/2019/01/Cash-Payment-Receipt-Template.pdf", "test.pdf")
# Read the file
doc = DocumentFile.from_pdf("test.pdf")
print(f"Number of pages: {len(doc)}")

Error traceback

Traceback (most recent call last):
  File "/Users/bnf/Desktop/docTR/test.py", line 22, in <module>
    doc = DocumentFile.from_pdf("test.pdf")
  File "/Users/bnf/.virtualenvs/dl4cv/lib/python3.9/site-packages/doctr/io/reader.py", line 37, in from_pdf
    return read_pdf(file, **kwargs)
  File "/Users/bnf/.virtualenvs/dl4cv/lib/python3.9/site-packages/doctr/io/pdf.py", line 42, in read_pdf
    return [np.asarray(img) for img, _ in pdfium.render_pdf_topil(file, scale=scale, **kwargs)]
AttributeError: module 'pypdfium2' has no attribute 'render_pdf_topil'

Environment

DocTR version: 0.5.1 TensorFlow version: 2.9.1 PyTorch version: N/A (torchvision N/A) OpenCV version: 4.6.0 OS: Mac OSX 10.14.6 Python version: 3.9.13 Is CUDA available (TensorFlow): No Is CUDA available (PyTorch): N/A CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA

Deep Learning backend

print(f"is_tf_available: {is_tf_available()}") is_tf_available: True print(f"is_torch_available: {is_torch_available()}") is_torch_available: False

altomator avatar Jun 11 '22 19:06 altomator

Hi @altomator, could you please downgrade pypdfium2 to version 1.0.0 and test it again ? With the next release we support v2.0.0 :)

felixdittrich92 avatar Jun 12 '22 08:06 felixdittrich92

@mara004 have you changed anything in pypdfium2 corresponding to this issue in v2.1 ? :sweat_smile: And i saw you have added __enter__ and __exit__ in 2.1 so would you maybe open an PR to replace closing with with ? :D

felixdittrich92 avatar Jun 16 '22 13:06 felixdittrich92

Concerning the issue reported by @altomator, the traceback tells this is pypdfium2 version 2 used as if it were version 1, so this is already fixed in main. I indeed added the context manager API, so PdfDocument can now be used in a with block. However, I guess the change is so simple that a member of doctr team could probably commit it directly. You may also want to consider adding an upper-bound version limit for pypdfium2 (i. e. >=2.1, <3) to prevent similar issues in the future.

mara004 avatar Jun 16 '22 16:06 mara004

solution: for the current doctr (version v0.5.1 available via pip) you need to downgrade pypdfium2 to version 1.0.0 pip3 install pypdfium2==1.0.0 this will be fixed with the next release

felixdittrich92 avatar Jun 17 '22 06:06 felixdittrich92

solution: for the current pip version v0.5.1 [...]

I guess you meant "for the current doctr version" ;) ?

mara004 avatar Jun 17 '22 18:06 mara004

solution: for the current pip version v0.5.1 [...]

I guess you meant "for the current doctr version" ;) ?

Updated :)

felixdittrich92 avatar Jun 17 '22 18:06 felixdittrich92

We'll schedule a minor patch release to fix this shortly as I believe it could impact a lot of new users who install pypdfium2 for the first time :+1:

frgfm avatar Jun 23 '22 16:06 frgfm

solved after 0.6.0 release

felixdittrich92 avatar Sep 26 '22 07:09 felixdittrich92

But the release hasn't been made yet :sweat_smile: For the sake of not forgetting issues tracked by a release, I think it's safer to close this afterwards

frgfm avatar Sep 26 '22 08:09 frgfm

released

felixdittrich92 avatar Sep 29 '22 12:09 felixdittrich92