OCRmyPDF
OCRmyPDF copied to clipboard
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
ocrmypdf works great with pdfs with scanned images . However in case of handwritten letter, the tessaract-ocr engine struggles many a time. How do I use Azure ocr API as...
I use OCRmyPDF for processing lots of "native" PDFs (with that I mean PDFs generated by Word, etc.). Due to some constraints a lot of these PDFs have to be...
I may be missing something but it seems that the value of `self._has_text` set in this section of code: https://github.com/ocrmypdf/OCRmyPDF/blob/5c6030960945fe299291fa134cff35c86a644b9f/src/ocrmypdf/pdfinfo/info.py#L779-L788 is always overwritten here: https://github.com/ocrmypdf/OCRmyPDF/blob/5c6030960945fe299291fa134cff35c86a644b9f/src/ocrmypdf/pdfinfo/info.py#L804-L822
**Describe the issue** If you want to create a perfect OCR, 100% correct text, you need some editing function. For example "gImageReader" gives some basic editing function (but has some...
**Issue by [drdownload](https://github.com/drdownload)** _Thu Oct 30 08:25:16 2014_ _Originally opened as https://github.com/fritz-hh/OCRmyPDF/issues/98_ --- it would be great to have an option to remove blank pages. I scan a lot of...
Added a GitHub action for this project. You can find the action here. [OCR PDF Action: A GitHub action for turning scanned PDF's into searchable documents](https://github.com/MarketingPipeline/OCR-PDF-Action) :+1:
**Is your feature request related to a problem? Please describe.** My use case is "scanning" documents with a smartphone camera, then archiving those "scans" as low-quality monochrome images. But OCR...
Hi. First, sorry for my poor English. **Description** Recently I upgraded my tesseract engine from v4.0.0.20181030 to v5.0.0-alpha.20201127 and two things happened. One is there is space between every single...
**Describe the bug** After rearranging pages with `pdfjam` in a scanned document, the resulting file with images cannot be optimized, because the image type is unexpected (`/Form`). **To Reproduce** A...
@jbarlow83, you're amazing for putting this out here. Just wanted to drop a note to say thanks! :smile: