ocr-table
ocr-table copied to clipboard

Published 20 hours ago •

cseas

→

Metadata

Extract tables from scanned image PDFs using Optical Character Recognition.

Readme
Issues

ocr-table

This project aims to extract tables from scanned image PDFs using Optical Character Recognition.

Install Requirements

Tesseract OCR
```
sudo apt-get install tesseract-ocr
```
Imagemagick
```
sudo apt-get install imagemagick
```
PDF Utilities
```
sudo apt-get install poppler-utils
```
Python packages
```
sudo pip install -r requirements.txt
```

Usage

Clear the pdf/ folder and copy all your pdf files to be scanned in it.
Run the OCR:
```
python3 shellocr.py
```
The scanned text files shall be available in the txt/ folder once the process completes.

Alternate

If the above doesn't work for you, try the alternate method.
Save your file as input.pdf in the root directory.
Run
```
python3 pdf_miner.py 
```

About

Extract tables from scanned image PDFs using Optical Character Recognition.

python

shell

ocr

tesseract

optical-character-recognition

extract-tables

ocr-table

pdfminer

scanned-image-pdfs

249

Stars

64

Forks

Watchers

Owner

cseas

← Metadata

249

Stars

64

Forks

Watchers

Owner

cseas

Metadata

Extract tables from scanned image PDFs using Optical Character Recognition.

Back

ocr-table ocr-table copied to clipboard

Metadata

ocr-table

Install Requirements

Usage

Alternate

← Metadata

Owner

Metadata

ocr-table
ocr-table copied to clipboard