OCRmyPDF [Feature]: sidecar Support Text Output to io.StringIO()

[Feature]: sidecar Support Text Output to io.StringIO()

Open MAbdElRaouf opened this issue 1 year ago • 0 comments

Describe the proposed feature

Using sidecar="page_text.txt" works great and the generated file on disk has the expected text. However, passing an io.StringIO() to sidecar doesn't seem to work and no text is being saved to the buffer:

import ocrmypdf
import io

output_pdf_file_obj = io.BytesIO()

page_text_buffer = io.StringIO()

ocrmypdf.ocr(
    "source_pdf.pdf",
    output_pdf_file_obj,
    sidecar=page_text_buffer,
    pages=1,
    tesseract_pagesegmode=1,
)

>>> page_text_buffer.seek(0)
>>> print(page_text_buffer.read())
''

>>> page_text_buffer.seek(0)
>>> print(page_text_buffer.getvalue())
''

Feb 13 '24 04:02 MAbdElRaouf

OCRmyPDF OCRmyPDF copied to clipboard

[Feature]: sidecar Support Text Output to io.StringIO()

Describe the proposed feature

OCRmyPDF
OCRmyPDF copied to clipboard