tesseract
tesseract copied to clipboard
run tesseract with multiple languages at one time C#
How does tesseract work with multiple languages text?
I installed Tesseract 4.1.1 by Charles weld, from NuGet package manager, but i can run the engine over one language file
Here is my code:
var img = new Bitmap(Open_Image_File.FileName); var ocr = new TesseractEngine("./tessdata", "eng", EngineMode.LstmOnly); var page = ocr.Process(img); txtres.Text = page.GetText();
I am wondering if someone could assist to use two ore three languages at the same time, for example (English and Arabic) together?
I’ve never used multiple languages, but the source code for the C++ tesseract project states the following syntax. Contact tesseract-ocr on Google Groups for more information.
// Parse a string of the form [~]
// Langs with no prefix get appended to to_load, provided they
// are not in there already.
// Langs with ~ prefix get appended to not_to_load, provided they are not in
// there already.
I’ve never used multiple languages, but the source code for the C++ tesseract project states the following syntax. Contact tesseract-ocr on Google Groups for more information. // Parse a string of the form [~]
[+[~] ]*. // Langs with no prefix get appended to to_load, provided they // are not in there already. // Langs with ~ prefix get appended to not_to_load, provided they are not in // there already.
Im also trying to do so and this doesnt seem to work for me. Im creating a engine like this:
new TesseractEngine("someFolder", "rus+eng", EngineMode.LstmOnly);
This results in only russian characters being read.
Using "eng+rus"
results in only english characters being read.
I’ve never used multiple languages, but the source code for the C++ tesseract project states the following syntax. Contact tesseract-ocr on Google Groups for more information. // Parse a string of the form [~][+[~]]*. // Langs with no prefix get appended to to_load, provided they // are not in there already. // Langs with ~ prefix get appended to not_to_load, provided they are not in // there already.
Im also trying to do so and this doesnt seem to work for me. Im creating a engine like this:
new TesseractEngine("someFolder", "rus+eng", EngineMode.LstmOnly);
This results in only russian characters being read. Using
"eng+rus"
results in only english characters being read.
For me the issue was that I was using models from tesdata_fast
. Using models from tesdata_best
solved the issue.