ccextractor
ccextractor copied to clipboard
[BUG] TESSDATA_PREFIX requires path separator at its end
Necessary information
- arguments: just the filename (including location):
$ TESSDATA_PREFIX=/nix/store/8lr60hp7yv0aysns056b74fsi8fm49zg-tesseract-3.05.00/share/ ./result/bin/ccextractor ~/Downloads/telecine.ts
- platform: NixOS (Linux)
- regression: I don't know
Video link
I hope that it works with any file that uses tesseract (files that store subtitles as images). Because if it isn't then it means that the location of the tesseract data is dealt in a separate way for different files.
I used the one from #1104 (https://edge1.motv.eu/telecine.ts)
Additional information
TESSDATA_PREFIX
is an environment variable that points to the directory/folder containing the tessdata directory/folder. For some reason, ccextractor requires TESSDATA_PREFIX
to finish with a /
. It should work without one.
e.g.
TESSDATA_PREFIX=/nix/store/8lr60hp7yv0aysns056b74fsi8fm49zg-tesseract-3.05.00/share
Should work but it doesn't.
Feel free to fix :-)
char* probe_tessdata_location(int lang_index)
in ocr.c
This environment variable isn't documented so I discovered about it by looking at ocr.c.
Documentating it should also be done.
TESSDATA_PREFIX is a tesseract environment variable, not ours (even though we use it).
TESSDATA_PREFIX is a tesseract environment variable, not ours (even though we use it).
Yes but how is a user supposed to know, they can use it? In the end, ccextractor, implements it so I believe it should be documented.
TESSDATA_PREFIX is a tesseract environment variable, not ours (even though we use it).
Yes but how is a user supposed to know, they can use it? In the end, ccextractor, implements it so I believe it should be documented.
Go ahead :-)
This is a regression from this line:
https://github.com/CCExtractor/ccextractor/commit/5dbbe654f05f1b3e5fcdfd6633e6258bed216345#diff-06df1969161cf1684b04764b42380ce6R52
I'll let @anshul1912 comment and decide since it's his code and he knows what he's doing :-)
@NilsIrl did you test with both tesseract 3 and 4?
@NilsIrl did you test with both tesseract 3 and 4?
yes
I think you will break ubuntu version 4 with it, I think it may work on nixOS but break Ubuntu. what is location of tessdata in your nixOS installation. If you are using only enviorment variable TESSDATA_PREFIX then as you see in function first priority is given to environment variable. if there is default location in nixOS tessdata but enviorment variable is not set. Then you must add that location in probe function