doctr
doctr copied to clipboard
[pdf] Switch PDF page rendering to an iterator format
As suggested by @mara004 in #1000, the PDF rendering is using a list comprehension which holds page processing while the rendering isn't complete: https://github.com/mindee/doctr/blob/78c9105bb6c27c4a35ee9f601263da01668224b1/doctr/io/pdf.py#L45
This could be fixed by modifying it to yield the page and limit RAM usage:
with pdfium.PdfDocument(file, password=password) as pdf:
for img in pdf.render_topil(scale=scale, **kwargs): yield np.asarray(img)
Waiting until #1032 is available
Outdated by #1240