OCR4wikisource
OCR4wikisource copied to clipboard
Run do_ocr.py automatically when pages are not equal
Run do_ocr.py automatically when pages are not equal at the end of first do_ocr.py run. Right now, it waits for user input.
It will create endless loop, because we are using third party tool (Google drive) and ocr depends on scan page quality, so manual input necessary. so after first run completed next three/four time can be set re-run automatically. For next run should be done by user and there may be two option
- re-run
- skipped the undone page
skipped page as describe #38 can be added here for complete the full ocr process.
I see. How about limiting the iteration to 1 or 2 times only and then request manual input? This way, endless loop can be avoided.
But, this automatic feature is necessary if we are going to run batch of files together without the need for editing config.ini every time for new file. When the tool moves to the cloud, this might be necessary.
This is needed. Happening to me almost every time. At least run do_ocr.py second time automatically, if some pages are not OCRed. After that we can do it manually.