tess4j icon indicating copy to clipboard operation
tess4j copied to clipboard

Tess4j - Error opening tessdata file by non-ASCII path

Open AliaksandrKi opened this issue 5 years ago • 4 comments

OS: Windows 10 IDE: IntelliJ tess4j: 4.5.1

I have two folders on my disc with equal 'eng.traineddata' files:

c:/data/eng.traineddata
c:/дата/eng.traineddata 

And tesseract fails while running next code:

Tesseract instance = new Tesseract();
// instance.setDatapath("c:/data");    // works without issues
instance.setDatapath("c:/дата");    // see Error message below
instance.setLanguage("eng");

String result = instance.doOCR(new File("c:/numbers.jpg"));

Error message:

Error opening data file c:/дата/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

AliaksandrKi avatar Jul 22 '20 16:07 AliaksandrKi

The error is pretty clear: you can't have non-ASCII characters in tessdata path. 'д' is not an ASCII character.

nguyenq avatar Jul 22 '20 19:07 nguyenq

@nguyenq thanks for the feedback! Could you provide a but more context here? Like if the root cause is on the Tesseract side or on the wrapper side, are there any workarounds available or any plans to support non-ASCII paths?

Snipx avatar Jul 23 '20 22:07 Snipx

It could be JNA or it could be inside Tesseract native code. On Linux, Tesseract and its tessdata directory are placed in standard system directories, so I doubt Tesseract code would ever need to deal with non-ASCII characters in those paths.

On Windows, you may want to try with a relative path without containing non-ASCII characters to see if it would work.

Maybe related to Issue https://github.com/nguyenq/tess4j/issues/75.

nguyenq avatar Jul 23 '20 23:07 nguyenq

Failure may happen when non-ascii exist in either source filename, data files names, or target filename. Meanwhile, same file names work when run tesseract command by ProcessBuilder.

You are right that the reason may be at java side when it handle filename with local API. A jdk bug: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8205991

Mararsh avatar Oct 08 '20 06:10 Mararsh