VideoSubFinder_ocr_path
VideoSubFinder_ocr_path copied to clipboard
All text has no space between the words
Hi, I've tested your program...I edit the build subs.py to add my language to be recognized, and works...but all words recognized, was written with no space between it...
"Something to be like this" ...was saved as "Somethingtobelikethis"
Could help me?
Yeah, i know what causes that. Tesseract tries to put spaces anywhere it doesn't see a letter, so I added a line to remove spaces from Tesseract's output, which works for Chinese which is what I was using it for, but obviously doesn't work for other languages.
In the text fixer function inside the replaceMe variable, remove the space. It's the second last entry on line 139
On Thu, 11 Jun 2020, 04:40 theruleof4, [email protected] wrote:
Hi, I've tested your program...I edit the build subs.py to add my language to be recognized, and works...but all words recognized, was written with no space between it...
"Something to be like this" ...was saved as "Somethingtobelikethis"
Could help me?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hamsolo474/VideoSubFinder_ocr_path/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQFYIWFYHWWDVWQ3A4RTGLRV7VS7ANCNFSM4N2XXVUQ .
Its to edit build subs.py to fix the problem? Sorry, my knowledge about programming its very basic.
Yeah it's very easy, open build_subs.py, go to line 139, you will see this ' ', delete it, save the document. See attached screenshot
On Thu, 11 Jun 2020, 09:47 theruleof4, [email protected] wrote:
Its to edit build subs.py to fix the problem? Sorry, my knowledge about programming its very basic.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hamsolo474/VideoSubFinder_ocr_path/issues/1#issuecomment-642356033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQFYIT35BN2RX4GRIOSH23RWAZSFANCNFSM4N2XXVUQ .
Ok. I'll try and see the results
Hm... seems not worked, I removed it and looked like this:
def text_fixer(text, debug=False):
replaceMe = ['\n', '“', '”', '、', '//', ';', '<',
'>', '-', '_', '=', '+', '*', '&', '^',
'%', '#','@', '$', '.', '.', '~', '\t']
But the words still with no space. Another thing: the progress percent it been saved too, above all text.. like this:
(95%)
Somethingtobelikethis
It's this, remove this.
On Thu, 11 Jun 2020, 10:13 theruleof4, [email protected] wrote:
Hm... seems not worked, I removed it and looked like this:
def text_fixer(text, debug=False):
replaceMe = ['\n', '“', '”', '、', '//', ';', '<', '>', '-', '_', '=', '+', '*', '&', '^', '%', '#','@', '$', '.', '.', '~', ' ', '\t']
But the words still with no space. Another thing: the progress percent it been saved too, above all text.. like this:
(95%)
Somethingtobelikethis
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hamsolo474/VideoSubFinder_ocr_path/issues/1#issuecomment-642363551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQFYISFJO5SXEQZO4YXP3LRWA4WHANCNFSM4N2XXVUQ .
Yeah, I've already removed and saved, I ran the script again and the problem was not fixed...