Using libtesseract on Windows
I tried to use libtesseract302.dll (from https://github.com/mnadeem/ocr-tess4j-example), but
AttributeError: function 'TessBaseAPIGetDatapath' not found
then I tried to use libtesseract400.dll (from https://github.com/nguyenq/tess4j which depends on https://github.com/nguyenq/lept4j)
but it seems that libtesseract400.dll not in libtesseract.tesseract_raw.libnames
by the way, ctypes.cdll.LoadLibrary will search dll from environment variable PATH on Windows at least
https://github.com/openpaperwork/pyocr/blob/ce23c2492739bef2b5313d257b1705e605d8ebcd/src/pyocr/libtesseract/tesseract_raw.py#L31
I think it's easy to fix, but why not pack with libtesseract, maybe this will make it easier to use
I tried to use libtesseract302.dll (from https://github.com/mnadeem/ocr-tess4j-example)
-
Windows support for libtesseract is based on contributions. I personally don't use it (I use
pyocr.tesseractfor my project on Windows). So the list of.dllto try to load is probably not up-to-date at all. Please don't hesitate to tell me if you need some new ones to be added. -
Tesseract 3.02 is known for not working well with Pyocr (on GNU/Linux anyway). Even if the binding did work,
is_available()would have return false. You should try with Tesseract >= 3.0.4. -
I don't know where those repositories come from, but they seem intended to be use with tess4j (Java) (are they patched specifically for tess4j ?). Anyway, I think you should use some more official/direct sources for your Tesseract installation: https://github.com/tesseract-ocr/tesseract/wiki/Downloads ; https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
-
AFAIK, Tesseract 4 is still in alpha. Pyocr supports it on Linux, but I cannot guarantee yet a good support on Windows at all.
I think it's easy to fix, but why not pack with libtesseract, maybe this will make it easier to use
Because if we go this way, for consistency, I would have to package also Tesseract.exe, Cuneiform, and data language files of both Tesseract and Cuneiform.
Thank you very much for granting me so much of your valuable time.
I don't know where those repositories come from
I just too lazy to complie libtesseract by myself, and search from github...
I try to use (3rd party - @parrot-office) in https://github.com/tesseract-ocr/tesseract/wiki/Downloads for win32, but it should use with many pvt.cppan.demo.xxx.dll
_(:з」∠)_ maybe I should try to complie...
Please don't hesitate to tell me if you need some new ones to be added.
these names maybe can be added:
libtesseract304.dll
libtesseract305.dll
libtesseract400.dll
libtesseract.dll
these names maybe can be added:
Done: 2d6ead7e9e3031d7b2efa3ccfdb37ece291a9b66