tesseract.js icon indicating copy to clipboard operation
tesseract.js copied to clipboard

When I load chi_sim with the 4.0.0_best tessdata, the console throws a warning message

Open lmk123 opened this issue 3 years ago • 3 comments

Describe the bug When I load simplified Chinese chi_sim with the 4.0.0_best tessdata, the console throws a warning message:

Error opening data file ./chi_sim_vert.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim_vert'

To Reproduce Steps to reproduce the behavior:

  1. Go to https://codesandbox.io/s/fancy-sun-91ezh?file=/src/index.js
  2. Open Chrome devtools

Expected behavior No warning message.

Screenshots

image

Desktop (please complete the following information):

  • OS: [e.g. iOS] WIndows 10
  • Browser [e.g. chrome, safari] Chrome
  • Version [e.g. 22] 89

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context

lmk123 avatar Mar 04 '21 20:03 lmk123

I have also run into this issue - and am not really sure if it's a problem or not. The only thing I can add is:

  • Discovered that it doesn't occur when I use linux browser, only a Windows browser, and
  • I verified that chi_sim_vert.traineddata.gz is reachable via the Windows browser, and
  • Ran the same code searching against chi_tra and again received the error, but this time it couldn't find "chi_tra_vert"

bitsandbytes avatar Feb 14 '22 19:02 bitsandbytes

While writing this up I came up with something else to try: updated the call to loadLanguage() to use "chi_sim+chi_sim_vert" and the warning went away. I hope this means I "fixed" it.

I haven't looked further yet to see the exact cause of the issue - who is assuming vert is loaded when it isn't?

bitsandbytes avatar Feb 14 '22 19:02 bitsandbytes

I meet the same warning as you.I guess it's due to unstable network connection. So i delete the chi_sim.traineddata.gz and redownload it. Finally the warning is miss and program is work.

Selenium39 avatar May 10 '22 16:05 Selenium39

I believe the core issue here is that invalid .traineddata files are cached and never deleted without manual intervention. In #753 I edited so the cache is cleared if the .traineddata file is found to be invalid, which I expect to resolve. After this change, if you fail to download a valid .traineddata file, Tesseract.js will try to download the file again the next time you run the script.

This is currently in the master branch and will be in the next npm release (4.0.6). Let me know if anybody encounters this issue subsequent to that.

Balearica avatar May 12 '23 03:05 Balearica