tesseract
tesseract copied to clipboard
tesseract::TessBaseAPI::ProcessPages cannot be stopped on demand
Environment
- Tesseract Version: Tesseract 4.1.1
- Platform: Win10 64bit, VS2017, MFC C++ application
Current Behavior:
tesseract::TessBaseAPI::ProcessPages cannot be stopped on demand. I didn't discover any way to stop tesseract::TessBaseAPI::ProcessPages when I want, some timeout could be setup, but this not help
Expected Behavior:
tesseract::TessBaseAPI::ProcessPages should have a solution to be stopped on demand
Suggested Fix:
tesseract::TessBaseAPI::ProcessPages should taking account ETEXT_DESC somehow. More details could be found here: https://stackoverflow.com/questions/72719440/stop-tesseracttessbaseapiprocesspages-on-demand
The timeout does not work the way you think it does.
The whole OCR process is done in a few phases.
- Binarization
- Image finding
- Horizontal and vertical lines finding
- Layout analysis
- Text recognition
The time is checked before each phase starts, but there is no way to stop Tesseract in the middle of a phase (except by manually killing the whole process).
So, in this case there no way to stop tesseract::TessBaseAPI::ProcessPages gracefully when I want, so, you can close the issue. If you have some hint here, please tell me. Thanks.
This timeout feature can be improved by adding a few sub-phases.
I marked this issue as 'feature request', but it will probably have low priority.
https://groups.google.com/g/tesseract-ocr/c/qkn_YO4SH-k/m/qIRsKdSJAAAJ
Not sure if problem is your implementation or expectation.
If you want to change the current implementation - a patch is welcomed. We have no capacity to support 3rd party projects.
Yes, that's will be perfect, I wonder if I can imply myself, to solve this as quickly as possible.