tesseract.js
tesseract.js copied to clipboard
Cannot use cachePath with Scheduler
Describe the bug Cannot use cachePath with Scheduler
To Reproduce Steps to reproduce the behavior:
- Create Scheduler
- Create N workers with
cachePath
- Start OCR
Expected behavior No errors and able to load languages
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: MacOS
- Browser: node
- Version: v14.17.6
Additional context
Error opening data file ./lav.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Error opening data file ./lav.traineddata
Failed loading language 'lav'
Tesseract couldn't load any languages!
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'lav'
Tesseract couldn't load any languages!
i was trying do something similar. i think it might have to with some kind of race conditions when when reading the files containing our local models. I noticed that when i forced my workers to run non-noncurrently i had no issues with opening my models. I was hoping the scheduler had support for 'initialize' or 'load' language jobs.
Disregard, this is not the cause. ~~I believe by default Tesseract both reads from and writes to the file you specify using cachePath
. Therefore, I do think it makes sense that there could be issues when using simultaneously on multiple threads. We could consider adding a note to documentation and/or checking if cachePath
arguments are being reused across workers and throw an warning if detected.~~
I was able to reproduce this bug in version 2, but not the current version (v3). If this issue is still active, please confirm you still encounter this bug in the latest version and provide a reproducible example.
Closing as stale.