pentaplex icon indicating copy to clipboard operation
pentaplex copied to clipboard

An OCR scanner for receipts

#+TITLE: pentaplex #+AUTHOR: phdenzel

A receipt scanner and reader which makes use of tesseract-ocr and imagemagick. It executes five basic functionalities (hence the program's name): 1. scan receipt image (/edge detection and warp transformation with opencv/) 2. preprocess scan (/clean, sharpen, and contrast/) 3. run OCR (/tesseract for optical character recognition/) 4. analyze OCR output (/with fuzzy finder and preconfigured dictionary/) 5. summarize analysis in a csv file

To prepare for the scanning of the receipts, create a directory called ~imgs/~ in the repository, and place pictures of the receipts in it; e.g. in Terminal (~cd~ into the repository first) type something of the sort:

#+BEGIN_SRC shell mkdir -p imgs/ cp ~/Downloads/*.JPG imgs/ #+END_SRC

*** Prerequisites

This program uses
- [[https://github.com/tesseract-ocr/][tesseract-ocr]]
- [[https://www.imagemagick.org/script/index.php][imagemagick]]
- [[https://github.com/opencv/opencv][opencv]]

*** Usage

To run pentaplex, type (of course ~cd~ into repository first):
#+BEGIN_SRC shell
  ./pentaplex [optional: auto]
#+END_SRC

*** Documentation

For code documentation visit:
[[https://phdenzel.github.io/pentaplex/][https://phdenzel.github.io/pentaplex/]]