OCRmyPDF icon indicating copy to clipboard operation
OCRmyPDF copied to clipboard

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Results 227 OCRmyPDF issues
Sort by recently updated
recently updated
newest added

ocrmypdf works great with pdfs with scanned images . However in case of handwritten letter, the tessaract-ocr engine struggles many a time. How do I use Azure ocr API as...

enhancement

I use OCRmyPDF for processing lots of "native" PDFs (with that I mean PDFs generated by Word, etc.). Due to some constraints a lot of these PDFs have to be...

I may be missing something but it seems that the value of `self._has_text` set in this section of code: https://github.com/ocrmypdf/OCRmyPDF/blob/5c6030960945fe299291fa134cff35c86a644b9f/src/ocrmypdf/pdfinfo/info.py#L779-L788 is always overwritten here: https://github.com/ocrmypdf/OCRmyPDF/blob/5c6030960945fe299291fa134cff35c86a644b9f/src/ocrmypdf/pdfinfo/info.py#L804-L822

robustness

**Describe the issue** If you want to create a perfect OCR, 100% correct text, you need some editing function. For example "gImageReader" gives some basic editing function (but has some...

enhancement

**Issue by [drdownload](https://github.com/drdownload)** _Thu Oct 30 08:25:16 2014_ _Originally opened as https://github.com/fritz-hh/OCRmyPDF/issues/98_ --- it would be great to have an option to remove blank pages. I scan a lot of...

enhancement

Added a GitHub action for this project. You can find the action here. [OCR PDF Action: A GitHub action for turning scanned PDF's into searchable documents](https://github.com/MarketingPipeline/OCR-PDF-Action) :+1:

**Is your feature request related to a problem? Please describe.** My use case is "scanning" documents with a smartphone camera, then archiving those "scans" as low-quality monochrome images. But OCR...

enhancement

Hi. First, sorry for my poor English. **Description** Recently I upgraded my tesseract engine from v4.0.0.20181030 to v5.0.0-alpha.20201127 and two things happened. One is there is space between every single...

third party issue

**Describe the bug** After rearranging pages with `pdfjam` in a scanned document, the resulting file with images cannot be optimized, because the image type is unexpected (`/Form`). **To Reproduce** A...

@jbarlow83, you're amazing for putting this out here. Just wanted to drop a note to say thanks! :smile: