Results 16 issues of Ravi

Prohibit do_ocr.py when ONLY .txt files remain and .upload or .log files are NOT available in the root folder. We had one instance when a user tried to run do_ocr.py...

I intentionally tried uploading text for pages that already exist as many test books are having partial proofread activity. It gives the following message: Moving the file text_for_page_00010.txt to the...

Run do_ocr.py automatically when pages are not equal at the end of first do_ocr.py run. Right now, it waits for user input.

File url - https://upload.wikimedia.org/wikipedia/commons/a/a5/%E0%A6%AC%E0%A7%8D%E0%A6%B0%E0%A6%B9%E0%A7%8D%E0%A6%AE%E0%A6%BE%E0%A6%A3%E0%A7%8D%E0%A6%A1%E0%A6%AA%E0%A7%81%E0%A6%B0%E0%A6%BE%E0%A6%A3%E0%A6%AE%E0%A7%8D%E2%80%8C.pdf WS URL - https://bn.wikisource.org/wiki/%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A7%8D%E0%A6%98%E0%A6%A3%E0%A7%8D%E0%A6%9F:%E0%A6%AC%E0%A7%8D%E0%A6%B0%E0%A6%B9%E0%A7%8D%E0%A6%AE%E0%A6%BE%E0%A6%A3%E0%A7%8D%E0%A6%A1%E0%A6%AA%E0%A7%81%E0%A6%B0%E0%A6%BE%E0%A6%A3%E0%A6%AE%E0%A7%8D%E2%80%8C.pdf File downloaded, split into 752 single column pdf files. But, it says uploading file 1 of 360 for more than 3 hours...

This will be thread to suggest changes in documentation, typos etc., When do_ocr.py is finished and the pages are not equal, it says "not equval". Change it is "not equal".

As we are getting more hands to use this tool, we are observing two people trying to upload the same book or uploading already existing books. This results in two...

Is it possible to collect usage stats for this tool? This will help to demonstrate impact and push WMF / Google to come up with an official tool based on...

To follow bot best practices, there is a time lag of 5 seconds between two new page creations according to the script . But, I see inconsistent page creation speed....

Please check https://ta.wikisource.org/w/index.php?title=Page%3A%E0%AE%A4%E0%AE%A9%E0%AE%BF_%E0%AE%B5%E0%AF%80%E0%AE%9F%E0%AF%81.pdf%2F79&type=revision&diff=97935&oldid=96015 Google has done an impossible OCR job as the page orientation is wrong. This could be avoided if before creating the individual pdf files after slicing page...

As we intend to crowdsource the computational resources needed for bulk upload, I feel the installation process needs to be simplified. @selvan suggests tools like https://www.docker.com/ can be used to...