ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

[BUG] TESSDATA_PREFIX requires path separator at its end

Open NilsIrl opened this issue 4 years ago • 10 comments

Necessary information

  • arguments: just the filename (including location):
$ TESSDATA_PREFIX=/nix/store/8lr60hp7yv0aysns056b74fsi8fm49zg-tesseract-3.05.00/share/ ./result/bin/ccextractor ~/Downloads/telecine.ts
  • platform: NixOS (Linux)
  • regression: I don't know

Video link

I hope that it works with any file that uses tesseract (files that store subtitles as images). Because if it isn't then it means that the location of the tesseract data is dealt in a separate way for different files.

I used the one from #1104 (https://edge1.motv.eu/telecine.ts)

Additional information

TESSDATA_PREFIX is an environment variable that points to the directory/folder containing the tessdata directory/folder. For some reason, ccextractor requires TESSDATA_PREFIX to finish with a /. It should work without one.

e.g.

TESSDATA_PREFIX=/nix/store/8lr60hp7yv0aysns056b74fsi8fm49zg-tesseract-3.05.00/share

Should work but it doesn't.

NilsIrl avatar Dec 29 '19 22:12 NilsIrl

Feel free to fix :-)

char* probe_tessdata_location(int lang_index)

in ocr.c

cfsmp3 avatar Dec 29 '19 22:12 cfsmp3

This environment variable isn't documented so I discovered about it by looking at ocr.c.

Documentating it should also be done.

NilsIrl avatar Dec 29 '19 22:12 NilsIrl

TESSDATA_PREFIX is a tesseract environment variable, not ours (even though we use it).

cfsmp3 avatar Dec 30 '19 07:12 cfsmp3

TESSDATA_PREFIX is a tesseract environment variable, not ours (even though we use it).

Yes but how is a user supposed to know, they can use it? In the end, ccextractor, implements it so I believe it should be documented.

NilsIrl avatar Dec 30 '19 13:12 NilsIrl

TESSDATA_PREFIX is a tesseract environment variable, not ours (even though we use it).

Yes but how is a user supposed to know, they can use it? In the end, ccextractor, implements it so I believe it should be documented.

Go ahead :-)

cfsmp3 avatar Dec 31 '19 09:12 cfsmp3

This is a regression from this line:

https://github.com/CCExtractor/ccextractor/commit/5dbbe654f05f1b3e5fcdfd6633e6258bed216345#diff-06df1969161cf1684b04764b42380ce6R52

NilsIrl avatar Jan 02 '20 17:01 NilsIrl

I'll let @anshul1912 comment and decide since it's his code and he knows what he's doing :-)

cfsmp3 avatar Jan 02 '20 17:01 cfsmp3

@NilsIrl did you test with both tesseract 3 and 4?

cfsmp3 avatar Jan 10 '20 01:01 cfsmp3

@NilsIrl did you test with both tesseract 3 and 4?

yes

NilsIrl avatar Jan 10 '20 07:01 NilsIrl

I think you will break ubuntu version 4 with it, I think it may work on nixOS but break Ubuntu. what is location of tessdata in your nixOS installation. If you are using only enviorment variable TESSDATA_PREFIX then as you see in function first priority is given to environment variable. if there is default location in nixOS tessdata but enviorment variable is not set. Then you must add that location in probe function

anshul1912 avatar Jan 15 '20 03:01 anshul1912