tesseract.js icon indicating copy to clipboard operation
tesseract.js copied to clipboard

tessedit_load_sublangs directive does not respect cache/langPath

Open ghost opened this issue 3 years ago • 2 comments

Error opening data file ./chi_tra.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_tra'

The Korean language trainneddata has a 'tessedit_load_sublangs chi_tra' entry and this causes an error. I am not sure why, as there are no overlaps between the two languages unlike Japanese.

I tried copying the chi_tra trained data to my working directory, then my traineddata folder, then into the tesseract and core packages... nothing works.

ghost avatar Nov 08 '21 04:11 ghost

Same here, using Vue can't get to where I should place my own .traineddata and how to get rid of TESSDATA_PREFIX error

Tried to use offline example, but somehow it doesn't work in Vue app

trpls1x avatar Nov 10 '21 13:11 trpls1x

If I understand the codes correctly, worker.loadLanguage() first loads the traineddata into wasm file system. So, manually using:

worker.loadLanguage('subLang+parentLang');
worker.initialize('parentLang');

should do the trick, since the sub-language would be already in the file system before loading the parent language.

chungym avatar Dec 25 '21 14:12 chungym

@chungym's understanding is correct--languages need to be loaded using that syntax before they are used. Tesseract does not interact directly with your filesystem.

Closing as the OP deleted his account, so no further response is possible.

Balearica avatar Sep 18 '22 06:09 Balearica