tesseract.js icon indicating copy to clipboard operation
tesseract.js copied to clipboard

Unexpectedly slow recognition performance

Open Dan-Shields opened this issue 4 years ago • 1 comments

I'm trying to use tesseract.js to read portions of text from a video feed. Because of #359, I'm feeding each frame to tesseract as a 1920x1080 PNG Buffer, with a 85x20 rectangle to crop to (following this).

The average time for worker.recognize() to complete is 5.5s. This seems really high for just a 1700 pixel area which contains 5-10 characters.

I'm running this on a 8C/16T Ryzen 7 3800X with 32GB of RAM so I don't think its a hardware limitation.

So my question is, is this expected performance? If not, what could I try to debug what exactly is taking so long?

Code snippet:

const imgBuffer = await image.getBufferAsync('image/png');

console.time('tesseract recog');

const { data: { text } } = await tesseractWorker.recognize(imgBuffer, { 
    rectangle: {
	left: 760,
	top: 17,
	width: 85,
	height: 20
    }
});
console.log('result: ', text);

console.timeEnd('tesseract recog');

Dan-Shields avatar Feb 10 '21 20:02 Dan-Shields

This is an evident issue and has been addressed before https://github.com/naptha/tesseract.js/issues/394

I'd also like to know if there's anything else to do than use C++ version to get faster recognition. Unfortunately that leaves out running on client side in browser.

torava avatar Apr 26 '21 15:04 torava

Closing as the significant performance disparity between C++ and wasm was resolved in the latest release 3.0.0. Feel free to open another issue if you encounter outrageous runtimes, but everything should now feel much faster.

Note: anything run on Safari will continue to be significantly slower as Apple has not yet implemented SIMD support. If you're reading this in the future, the following link should show if that has changed yet.

https://webassembly.org/roadmap/

Balearica avatar Aug 20 '22 04:08 Balearica