PDFtoTXT
PDFtoTXT copied to clipboard
Python code to read text from a PDF file (OCR).
PDF to TXT
Python code to do OCR recognition of a PDF file and export text to TXT file.
- LocalOCR: based on Tesseract OCR
- CloudOCR: based on Google Vision API
Setup for LocalOCR on Ubuntu
apt-get install python-pyocr python-wand imagemagick
apt-get install libleptonica-dev tesseract-ocr-dev
apt-get install tesseract-ocr-ita
pip install -r requirements.txt
Setup CloudOCR on Ubuntu
Install Google Cloud SDK
apt-get install pdfimages google-cloud-sdk-app-engine-python
pip install requirements.txt