Quan Nguyen

Results 29 comments of Quan Nguyen

I haven't test the updated vie.trainneddata; however, the vie.wordlist file found in https://github.com/tesseract-ocr/langdata/tree/master/vie is still plagued with errors. Many of the words seem to be corrupted (broken UTF-8 encoding) as...

With 4.00alpha vie language pack, many non-Viet alphabets appear in the output text, such as: öïäåů€†čµñÎīšçðßęě

According to its [readme](https://github.com/nguyenq/VietOCR3/blob/master/readme.html) file: > You can put init-only and non-init control parameters in tessdata/configs/tess_configs and tess_configvars files, respectively, to modify Tesseract's behaviour.

Similar problem and potential fix? https://groups.google.com/forum/#!topic/jna-users/aGxbNtanTSU https://github.com/matthiasblaesing/javatellstick/commit/e7e14ce7465859e4dc8c507545f42041ddb0e086 Preliminary testing indicated that it would work in Linux but break in Windows.

It seems your method calls -- `Process` and `BeginDocument` -- are in reverse order. Check out the [test case](https://github.com/charlesw/tesseract/blob/feature/321-Tesseract-4/src/Tesseract.Tests/ResultRendererTests.cs) for example.

If invert is really needed, Leptonica's `pixInvert` native function can be used.

Try: `api.SetVariable("debug_file", "/dev/null");`

You can use your Arabic input method to enter Arabic digits, or use the built-in conversion tool. At **Character** textbox, e.g., enter `U+0668` and click the adjacent button twice or...

The error is pretty clear: you can't have non-ASCII characters in `tessdata` path. 'д' is not an ASCII character.

It could be JNA or it could be inside Tesseract native code. On Linux, Tesseract and its `tessdata` directory are placed in standard system directories, so I doubt Tesseract code...