tesseract.js Huge perfomance gap between C++ version and WASM version

Huge perfomance gap between C++ version and WASM version

Open fchahun opened this issue 5 years ago • 1 comments

Hello,

I recently made a performance comparison test between the latest alpha release of Tesseract v5.0.0-alpha (C++) and the latest release of tesseractjs v2 (Node/typescript)

The JS/WASM port appears to be about 30 times slower than the C++ compiled executable to process the attached image.

On the same platform (Windows 10 64 bits), in both cases with 1 worker:

tesseract input_test_3.png out -l fra txt hocr => 4 secs
tesseractjs (executed with NodeJs 12.4.1) : 120 secs

Both timings include the initialization and model loading steps, but most of the gap seems to occur during OCR processing.

This test image is intendedly a poor quality scan intended to test OCR quality resilience, but this x30 performance ratio seems general on all tests I performed.

Is such a performance gap what is expected with this WASM port? According to published reports (e.g. https://arxiv.org/pdf/1901.09056.pdf), WASM performance penalty vs native is expected to be up to x3, but not x30...

Best Regards

input_test_3

Jan 10 '20 12:01 fchahun

Does it work in both cases though? How do you manage to associate the extracted data of each row with the right column?

Apr 21 '20 13:04 EHadoux

Closing as the significant performance disparity between C++ and wasm was resolved in the latest release 3.0.0. Recognition speeds using the .wasm version should be in the same ballpark as the desktop version now.

Note: anything run on Safari will continue to be significantly slower as Apple has not yet implemented SIMD support. If you're reading this in the future, the following link should show if that has changed yet.

https://webassembly.org/roadmap/

Aug 20 '22 04:08 Balearica

tesseract.js tesseract.js copied to clipboard

Huge perfomance gap between C++ version and WASM version

tesseract.js
tesseract.js copied to clipboard