OCRmyPDF icon indicating copy to clipboard operation
OCRmyPDF copied to clipboard

[Feature]: sidecar Support Text Output to io.StringIO()

Open MAbdElRaouf opened this issue 1 year ago • 0 comments

Describe the proposed feature

Using sidecar="page_text.txt" works great and the generated file on disk has the expected text. However, passing an io.StringIO() to sidecar doesn't seem to work and no text is being saved to the buffer:

import ocrmypdf
import io

output_pdf_file_obj = io.BytesIO()

page_text_buffer = io.StringIO()

ocrmypdf.ocr(
    "source_pdf.pdf",
    output_pdf_file_obj,
    sidecar=page_text_buffer,
    pages=1,
    tesseract_pagesegmode=1,
)
>>> page_text_buffer.seek(0)
>>> print(page_text_buffer.read())
''
>>> page_text_buffer.seek(0)
>>> print(page_text_buffer.getvalue())
''

MAbdElRaouf avatar Feb 13 '24 04:02 MAbdElRaouf