tesseract
tesseract copied to clipboard
Add ocr'ed text back to image and generate a PDF
It would be great if this package supported adding back the retrieved text from a raster to PDF format.
For example, using tesseract
directly from the command line makes this possible in one single command:
tesseract --dpi 600 --oem 2 input_01.png output_01 pdf
Second this.
Relatedly, would be nice if could take in a .pdf that contains some images and convert these to editable text, returning a new .pdf like how done with adobe.