tesseract.js
tesseract.js copied to clipboard
When I load chi_sim with the 4.0.0_best tessdata, the console throws a warning message
Describe the bug
When I load simplified Chinese chi_sim
with the 4.0.0_best tessdata, the console throws a warning message:
Error opening data file ./chi_sim_vert.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim_vert'
To Reproduce Steps to reproduce the behavior:
- Go to https://codesandbox.io/s/fancy-sun-91ezh?file=/src/index.js
- Open Chrome devtools
Expected behavior No warning message.
Screenshots
Desktop (please complete the following information):
- OS: [e.g. iOS] WIndows 10
- Browser [e.g. chrome, safari] Chrome
- Version [e.g. 22] 89
Smartphone (please complete the following information):
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
Additional context
I have also run into this issue - and am not really sure if it's a problem or not. The only thing I can add is:
- Discovered that it doesn't occur when I use linux browser, only a Windows browser, and
- I verified that chi_sim_vert.traineddata.gz is reachable via the Windows browser, and
- Ran the same code searching against chi_tra and again received the error, but this time it couldn't find "chi_tra_vert"
While writing this up I came up with something else to try: updated the call to loadLanguage() to use "chi_sim+chi_sim_vert" and the warning went away. I hope this means I "fixed" it.
I haven't looked further yet to see the exact cause of the issue - who is assuming vert is loaded when it isn't?
I meet the same warning as you.I guess it's due to unstable network connection. So i delete the chi_sim.traineddata.gz and redownload it. Finally the warning is miss and program is work.
I believe the core issue here is that invalid .traineddata
files are cached and never deleted without manual intervention. In #753 I edited so the cache is cleared if the .traineddata
file is found to be invalid, which I expect to resolve. After this change, if you fail to download a valid .traineddata
file, Tesseract.js will try to download the file again the next time you run the script.
This is currently in the master branch and will be in the next npm release (4.0.6
). Let me know if anybody encounters this issue subsequent to that.