videocr
videocr copied to clipboard
Can anyone help me how to use?
I'm trying to understand how to make it work, but it's all very confusing. I'm using Windows 10, I already have Python installed, I already have tesseract working, added to PATH, but I don't know how to make it work. I tried to follow what is explained in this issue: https://github.com/apm1467/videocr/issues/2 I created the file get_sub.py
I put the video in the same folder, I put all the scripts in the same folder but when I run, I get this error:
Traceback (most recent call last):
File "C:\Users\user\Programs\Python 3.7\venv\Lib\site-packages\videocr\get_sub.py", line 3, in
Someone please could help me?
I am trying to figure it out as well, despite having absolutely no background with this stuff. I think it's working as I type this. I would extract the videocr folder downloaded from here to your desktop instead (just simpler). Have you installed pip and then videocr, like suggested under the installation section?
We will try to figure this out together Edit: it ended up working for me like I said but the results were not great. I've moved on to using Subrip and FineReader, which is quite tedious.
I've moved on to using Subrip and FineReader, which is quite tedious.
@bassSoul How well does Subrip work?
@johan456789 Depends on the quality and font formatting of the text but overall does a good job if it's fairly standard. It sometimes generates duplicates or blanks, which you have to manually go through. This also could just be because the font I'm working with is brutal and I'm working with animation, which produces more false positives.
Note that ABBYY Finereader is required and subrip alone won't do the trick. You need to OCR the images into separate .txt files named according to your exported images. I believe only FineReader is capable of doing this as a batch export.
@theruleof4 unfortunately even those of us who can make it work aren't getting results. I suggest you look at the comments above to see if you can use SubRip or FineReader instead.
I was wondering if Subtitle Edit was capable of doing any or all of this process. It can OCR PGS subtitles with Tessaract. Seems like it should technically be able to read the images put out by VideoSubFinder. I can't get subrip to work, personally.
I dont know if im right, but isnt this code bricked due to the Tesseract Data File being moved?
In the README you can fin this link: this page
i was also checking some of the code on the constants.py file, some of the DatFile urls were also moved (and now the url give a 404 status), you can see for yourself:
TESSDATA_URL = 'https://github.com/tesseract-ocr/tessdata_fast/raw/master/{}.traineddata'
TESSDATA_SCRIPT_URL = 'https://github.com/tesseract-ocr/tessdata_best/raw/master/script/{}.traineddata'
i think it is possible to fix by updating the links but the since tesseract changed so much i doubt it would still work.
FYI I've created a working fork that uses PaddleOCR instead of Tesseract: https://github.com/oliverfei/videocr-PaddleOCR