doctr icon indicating copy to clipboard operation
doctr copied to clipboard

[pdf] Switch PDF page rendering to an iterator format

Open frgfm opened this issue 3 years ago • 1 comments

As suggested by @mara004 in #1000, the PDF rendering is using a list comprehension which holds page processing while the rendering isn't complete: https://github.com/mindee/doctr/blob/78c9105bb6c27c4a35ee9f601263da01668224b1/doctr/io/pdf.py#L45

This could be fixed by modifying it to yield the page and limit RAM usage:

with pdfium.PdfDocument(file, password=password) as pdf:
    for img in pdf.render_topil(scale=scale, **kwargs): yield np.asarray(img)

frgfm avatar Jul 29 '22 12:07 frgfm

Waiting until #1032 is available

felixdittrich92 avatar Sep 01 '22 19:09 felixdittrich92

Outdated by #1240

felixdittrich92 avatar Jul 24 '23 08:07 felixdittrich92