VideoSubFinder_ocr_path icon indicating copy to clipboard operation
VideoSubFinder_ocr_path copied to clipboard

All text has no space between the words

Open Ruke805 opened this issue 4 years ago • 7 comments

Hi, I've tested your program...I edit the build subs.py to add my language to be recognized, and works...but all words recognized, was written with no space between it...

"Something to be like this" ...was saved as "Somethingtobelikethis"

Could help me?

Ruke805 avatar Jun 10 '20 20:06 Ruke805

Yeah, i know what causes that. Tesseract tries to put spaces anywhere it doesn't see a letter, so I added a line to remove spaces from Tesseract's output, which works for Chinese which is what I was using it for, but obviously doesn't work for other languages.

In the text fixer function inside the replaceMe variable, remove the space. It's the second last entry on line 139

On Thu, 11 Jun 2020, 04:40 theruleof4, [email protected] wrote:

Hi, I've tested your program...I edit the build subs.py to add my language to be recognized, and works...but all words recognized, was written with no space between it...

"Something to be like this" ...was saved as "Somethingtobelikethis"

Could help me?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hamsolo474/VideoSubFinder_ocr_path/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQFYIWFYHWWDVWQ3A4RTGLRV7VS7ANCNFSM4N2XXVUQ .

hamsolo474 avatar Jun 11 '20 01:06 hamsolo474

Its to edit build subs.py to fix the problem? Sorry, my knowledge about programming its very basic.

Ruke805 avatar Jun 11 '20 01:06 Ruke805

Yeah it's very easy, open build_subs.py, go to line 139, you will see this ' ', delete it, save the document. See attached screenshot

On Thu, 11 Jun 2020, 09:47 theruleof4, [email protected] wrote:

Its to edit build subs.py to fix the problem? Sorry, my knowledge about programming its very basic.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hamsolo474/VideoSubFinder_ocr_path/issues/1#issuecomment-642356033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQFYIT35BN2RX4GRIOSH23RWAZSFANCNFSM4N2XXVUQ .

hamsolo474 avatar Jun 11 '20 01:06 hamsolo474

Ok. I'll try and see the results

Ruke805 avatar Jun 11 '20 02:06 Ruke805

Hm... seems not worked, I removed it and looked like this:

def text_fixer(text, debug=False):
    replaceMe = ['\n', '“', '”', '、', '//', ';', '<',
                 '>', '-', '_', '=', '+', '*', '&', '^',
                 '%', '#','@', '$', '.', '.', '~', '\t']

But the words still with no space. Another thing: the progress percent it been saved too, above all text.. like this:

(95%)
Somethingtobelikethis

Ruke805 avatar Jun 11 '20 02:06 Ruke805

It's this, remove this.

On Thu, 11 Jun 2020, 10:13 theruleof4, [email protected] wrote:

Hm... seems not worked, I removed it and looked like this:

def text_fixer(text, debug=False):

replaceMe = ['\n', '“', '”', '、', '//', ';', '<',

             '>', '-', '_', '=', '+', '*', '&', '^',

             '%', '#','@', '$', '.', '.', '~', ' ', '\t']

But the words still with no space. Another thing: the progress percent it been saved too, above all text.. like this:

(95%)

Somethingtobelikethis

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hamsolo474/VideoSubFinder_ocr_path/issues/1#issuecomment-642363551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQFYISFJO5SXEQZO4YXP3LRWA4WHANCNFSM4N2XXVUQ .

hamsolo474 avatar Jun 11 '20 02:06 hamsolo474

Yeah, I've already removed and saved, I ran the script again and the problem was not fixed...

Ruke805 avatar Jun 11 '20 15:06 Ruke805