OCR4wikisource icon indicating copy to clipboard operation
OCR4wikisource copied to clipboard

Run do_ocr.py automatically when pages are not equal

Open ravidreams opened this issue 9 years ago • 3 comments

Run do_ocr.py automatically when pages are not equal at the end of first do_ocr.py run. Right now, it waits for user input.

ravidreams avatar Feb 17 '16 15:02 ravidreams

It will create endless loop, because we are using third party tool (Google drive) and ocr depends on scan page quality, so manual input necessary. so after first run completed next three/four time can be set re-run automatically. For next run should be done by user and there may be two option

  1. re-run
  2. skipped the undone page

skipped page as describe #38 can be added here for complete the full ocr process.

jayantanth avatar Feb 18 '16 02:02 jayantanth

I see. How about limiting the iteration to 1 or 2 times only and then request manual input? This way, endless loop can be avoided.

But, this automatic feature is necessary if we are going to run batch of files together without the need for editing config.ini every time for new file. When the tool moves to the cloud, this might be necessary.

ravidreams avatar Feb 18 '16 10:02 ravidreams

This is needed. Happening to me almost every time. At least run do_ocr.py second time automatically, if some pages are not OCRed. After that we can do it manually.

bodhisattwawiki avatar Mar 27 '17 17:03 bodhisattwawiki