erpnext_ocr
erpnext_ocr copied to clipboard
:snake: :alembic: Optical Character Recognition using tesseract within Frappe.
ERPNext OCR
:alembic: Experimental Frappe OCR application with tesseract.
This project is a fork of ERPNext-OCR by John Vincent Fiel. Its aim is to fix and cleanup the original source code and add some new features.
Check out more on ERPNext Discuss.
:chart_with_upwards_trend: Changes
See CHANGELOG
:bookmark: Roadmap
See Taiga.io
:construction: Install
Pre-requisites: tesseract-python and imagemagick
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript
Install Frappe application
bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr
When installing Frappe app, the following python requirements will be installed:
-
python binding for tesseract, tesserocr
-
image processing library in python, pillow
-
HTTP library in python, requests
-
python binding for imagemagick, wand
:rocket: Usage
File Being Read:
Sample Screenshot:
Tesseract trained data
In order to use OCR with different languages, you need to install the appropriate trained data files. Check tesseract Wiki for details: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
Development
If you wish to develop or just test locally this application, you can use docker-compose up -d
at the root of the this repository.
You can then access your ERPNext OCR dev env at http://localhost:8080
.
Known issues
-
wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412
-
This can happen due to security configuration in imagemagick preventing it to read PDF files.
-
Reference:
-
-
wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.
-
This might happen if you're missing a dependency to convert PDF, most of the time
ghostscript
-
References:
-
-
OSError: encoder error -2 when writing image file
- This might happen when trying to open a TIFF image, but the real error is "hidden" and only displayed in console.
- If the original error in console is
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
that usually happens when TIFF image compression is not valid / recognized.
:white_check_mark: Run tests
bench run-tests --app erpnext_ocr
:bust_in_silhouette: Authors
Monogramm
- Website: https://www.monogramm.io
- Github: @Monogramm
John Vincent Fiel
- Github: @jvfiel
:handshake: Contributing
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Check the contributing guide.
:thumbsup: Show your support
Give a :star: if this project helped you!
:page_facing_up: License
Copyright © 2019 Monogramm.
This project is MIT licensed.
This README was generated with :heart: by readme-md-generator