Quan Nguyen comments

Results 29 comments of


Quan Nguyen

Issue 1392: Vietnamese dictionaries

I haven't test the updated vie.trainneddata; however, the vie.wordlist file found in https://github.com/tesseract-ocr/langdata/tree/master/vie is still plagued with errors. Many of the words seem to be corrupted (broken UTF-8 encoding) as...

Vietnamese

With 4.00alpha vie language pack, many non-Viet alphabets appear in the output text, such as: öïäåů€†čµñÎīšçðßęě

Recognize both digit & alphabet when fine tune digits

According to its [readme](https://github.com/nguyenq/VietOCR3/blob/master/readme.html) file: > You can put init-only and non-init control parameters in tessdata/configs/tess_configs and tess_configvars files, respectively, to modify Tesseract's behaviour.

Invalid calling convention 63 exceptions on Ubuntu

Similar problem and potential fix? https://groups.google.com/forum/#!topic/jna-users/aGxbNtanTSU https://github.com/matthiasblaesing/javatellstick/commit/e7e14ce7465859e4dc8c507545f42041ddb0e086 Preliminary testing indicated that it would work in Linux but break in Windows.

"Failed to begin document " occurs when want to create searchable pdf from an image

It seems your method calls -- `Process` and `BeginDocument` -- are in reverse order. Check out the [test case](https://github.com/charlesw/tesseract/blob/feature/321-Tesseract-4/src/Tesseract.Tests/ResultRendererTests.cs) for example.

PixToBitmapConverter inverts Format1bppIndexed images

If invert is really needed, Leptonica's `pixInvert` native function can be used.

Can't Hide Diacritics Debug Message

Try: `api.SetVariable("debug_file", "/dev/null");`

Arabic Numbers

You can use your Arabic input method to enter Arabic digits, or use the built-in conversion tool. At **Character** textbox, e.g., enter `U+0668` and click the adjacent button twice or...

Tess4j - Error opening tessdata file by non-ASCII path

The error is pretty clear: you can't have non-ASCII characters in `tessdata` path. 'д' is not an ASCII character.

Tess4j - Error opening tessdata file by non-ASCII path

It could be JNA or it could be inside Tesseract native code. On Linux, Tesseract and its `tessdata` directory are placed in standard system directories, so I doubt Tesseract code...