tesseract.js-core
tesseract.js-core copied to clipboard
Emscripten port of Tesseract C++ API
tesseract.js-core

Core part of tesseract.js, which compiles original tesseract from C to JavaScript WebAssembly.
Structure
- Build scripts are in
build-scriptsfolder - Javascript/wrapper files are in
javascriptfolder - All dependencies (including Tesseract) are in
third_partyfolder- All dependencies are unmodified except for Tesseract, which uses a forked repo
- The Tesseract repo has the following changes:
- Modified
CMakeLists.txtto build with emscripten - Modified
ltrresultiterator.handltrresultiterator.cppto addWordChoiceIteratorclass - Added
src/arch_seefolder, which is used instead ofsrc/archfor the simd-enabled build- This hard-codes the use of the SSE function
- Commented out "Empty page!!" message in
src/textord/colfind.cppto prevent this from printing to console - Modified
src/ccmain/thresholder.cpp,src/ccmain/thresholder.h,src/api/baseapi.cpp, andinclude/tesseract/baseapi.hto add option for rotating images using exif orientation tag
- Modified
Running Minimal Examples
To run the browser examples, launch a web server in the root of the repo (i.e. run http-server). Then navigate to the pages in examples/web/minimal/ in your browser.
To run the node examples, navigate to examples/node/minimal/ and then run e.g. node index.wasm.js.
The "benchmark" examples behave similarly, except that they take longer to run and report runtime instead of recognition text. All other examples are experimental and should not be expected to run.
Contribution
As we leverage git-submodule to manage dependencies, remember to add recursive when cloning the repository:
$ git clone --recursive https://github.com/naptha/tesseract.js-core
To build tesseract-core.js by yourself, please install docker and run:
$ bash build-with-docker.sh
The genreated files will be stored in root path.