pentaplex
pentaplex copied to clipboard
An OCR scanner for receipts
#+TITLE: pentaplex #+AUTHOR: phdenzel
A receipt scanner and reader which makes use of tesseract-ocr and imagemagick. It executes five basic functionalities (hence the program's name): 1. scan receipt image (/edge detection and warp transformation with opencv/) 2. preprocess scan (/clean, sharpen, and contrast/) 3. run OCR (/tesseract for optical character recognition/) 4. analyze OCR output (/with fuzzy finder and preconfigured dictionary/) 5. summarize analysis in a csv file
To prepare for the scanning of the receipts, create a directory called ~imgs/~ in the repository, and place pictures of the receipts in it; e.g. in Terminal (~cd~ into the repository first) type something of the sort:
#+BEGIN_SRC shell mkdir -p imgs/ cp ~/Downloads/*.JPG imgs/ #+END_SRC
*** Prerequisites
This program uses
- [[https://github.com/tesseract-ocr/][tesseract-ocr]]
- [[https://www.imagemagick.org/script/index.php][imagemagick]]
- [[https://github.com/opencv/opencv][opencv]]
*** Usage
To run pentaplex, type (of course ~cd~ into repository first):
#+BEGIN_SRC shell
./pentaplex [optional: auto]
#+END_SRC
*** Documentation
For code documentation visit:
[[https://phdenzel.github.io/pentaplex/][https://phdenzel.github.io/pentaplex/]]