Tesseract-OCR-iOS icon indicating copy to clipboard operation
Tesseract-OCR-iOS copied to clipboard

Using traineddata from tesseract-ocr

Open takzee opened this issue 8 years ago • 29 comments

Hello,

I wish to know how to use the traineddata available from tesseract-ocr without inducing

actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 53

Much appreciated.

takzee avatar Dec 01 '16 08:12 takzee

+1

mgerdt avatar Dec 02 '16 15:12 mgerdt

+1

MaxTalic avatar Dec 06 '16 06:12 MaxTalic

I was able to resolve it by using a different version of the traineddata file that I borrowed from a Tesseract tutorial posted elsewhere. I found the installation instructions for Tesseract iOS repo work perfectly, but the current version of thetraineddata does not work with 4.0.0. Details here

Update: Upon further digging, I discovered this version tessdata has the same eng.traineddata file I used to get my project working.

AdrianBinDC avatar Dec 15 '16 15:12 AdrianBinDC

+1

mirko-fairr avatar Dec 19 '16 18:12 mirko-fairr

+1

flyweights avatar Dec 20 '16 12:12 flyweights

+1

neoneye avatar Dec 22 '16 12:12 neoneye

Apparently this is a very common issue that I keep getting upvotes for this but nobody actually knows how to solve it.

takzee avatar Dec 23 '16 06:12 takzee

@takzee I was able to solve it using the link in my post above.

AdrianBinDC avatar Dec 23 '16 17:12 AdrianBinDC

@AdrianBinDC Yes that is a workaround, I think this issue should be looked into since the workaround only applicable to English or luckily some other languages where there are available trained files.

takzee avatar Dec 27 '16 02:12 takzee

I believe this issue could be solved by upgrading the Tesseract version to 3.04 so that it is sync with the training data here: https://github.com/tesseract-ocr/langdata.

There is at least one fork were this is done, e.g. https://github.com/exherb/Tesseract-OCR-iOS

FWJonathan avatar Jan 05 '17 10:01 FWJonathan

https://github.com/exherb/Tesseract-OCR-iOS This example in Chinese is ok, thank you. @FWJonathan

flyweights avatar Jan 06 '17 02:01 flyweights

@AdrianBinDC Thanks for saving my time!! Your solution really works for me. I installed it using pod 'TesseractOCRiOS', '4.0.0' and it just crashed.

CaliosD avatar Jan 11 '17 02:01 CaliosD

To resolve this issue use older version of training data from: https://github.com/tesseract-ocr/tessdata/tree/3.04.00. Worked for me.

bibhas2 avatar Jan 12 '17 17:01 bibhas2

As friend checked, for Android version, their using newest tessdata file have better OCR result with Chinese..Is anyone know how to update the Tesseract library?

freedylam avatar Jan 16 '17 03:01 freedylam

how to fixit... help me....

actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 53

monxarat avatar Jan 19 '17 10:01 monxarat

Switching back to Tessdata 3.0.4 allows the program to compile but the results are horrendous. I supplied a very simple image with English words and the program failed to recognize it coherently. I wonder if the 4.0.0 version would be better. However, I'm still experiencing that error as of the latest master.

hudaniel avatar Mar 07 '17 05:03 hudaniel

@computerion Thank you very much.

monxarat avatar Mar 10 '17 06:03 monxarat

@AdrianBinDC Thank you~~~

SuperZico avatar Mar 16 '17 07:03 SuperZico

[email protected] is working in android. And I found accuracy rate of [email protected] is better than this version tessdata

mdsb100 avatar Apr 12 '17 06:04 mdsb100

@gali8 Get some help, please.

mdsb100 avatar May 15 '17 01:05 mdsb100

Hello. I have a problem with japanese languge. i hope get hep! thank you so much

hungnmai avatar Jul 31 '17 08:07 hungnmai

@AdrianBinDC thanks for helping. It works with 4.0 version on iOS

brkyvrkn avatar Aug 09 '17 20:08 brkyvrkn

The previous version data won't crash. But can't recognize anything.

My code:

    let tesseract:G8Tesseract = G8Tesseract(language:"eng");    
    tesseract.delegate = self;
    tesseract.charWhitelist = "01234567890";
    tesseract.image = UIImage(named: "sample.jpg")
    tesseract.recognize();
    
    NSLog("%@", tesseract.recognizedText);

The image:

sample

The result: empty!

zhouhao27 avatar Aug 14 '17 17:08 zhouhao27

+1 I have problem with Thai language @gali8, any idea how to resolve it?

ckgal avatar Sep 14 '17 09:09 ckgal

Hello, I am using the tessdata from this repo https://github.com/tesseract-ocr/tessdata/tree/bf82613055ebc6e63d9e3b438a5c234bfd638c93

But they won't work with pod 'TesseractOCRiOS', '4.0.0'

My goal is to use this project https://github.com/vinhvu200/BillSplit with other language traineddata but I don't know where to find out which tess data version i have to use? I would try them all but I cloned several versions (2GB!) and my internet connection is not that fast.

ghost avatar Aug 01 '18 08:08 ghost

hello,https://github.com/gali8/Tesseract-OCR-iOS/issues/299int returnCode = self.tesseract->Init(self.absoluteDataPath.fileSystemRepresentation, self.language.UTF8String, (tesseract::OcrEngineMode)self.engineMode, (char **)configs, count, &tessKeys, &tessValues, false);This is my console print messageactual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file ../../ccutil/tessdatamanager.cpp, line 53,I hope to get your help

KGDeveloper avatar Jan 19 '19 02:01 KGDeveloper

I believe this issue could be solved by upgrading the Tesseract version to 3.04 so that it is sync with the training data here: https://github.com/tesseract-ocr/langdata.

There is at least one fork were this is done, e.g. https://github.com/exherb/Tesseract-OCR-iOS

As friend checked, for Android version, their using newest tessdata file have better OCR result with Chinese..Is anyone know how to update the Tesseract library?

Hello, have you solved this problem? I have a similar problem. Android and Windows worked fine, iOS crashed. I compared the OCR version of android, which is 3.0.5. I plan to recompile the Submodule dependency library version to solve this problem, but there are more problems after the revision. Has anyone tried to upgrade successfully?

zhuozhuo avatar Mar 18 '20 03:03 zhuozhuo

I found this to solve the problem. https://github.com/chaoskyme/Tesseract-OCR-iOS

zhuozhuo avatar Mar 25 '20 01:03 zhuozhuo

thanks @AdrianBinDC , your suggested traineddata files are comaptible when used on Android & iOS. I do have a question. From what i understand, the traineddata files from normal_tessdata directory are compatible with Android & iOS. But the traineddata files from tessdata_best & tessdata_fast directories are not compatible with Android & iOS platforms, and give the error TessBaseAPIInit3(tessHandle,dataPath,lang) != 0 .

I need to perform some additional training on eng.traineddata file, for which i must use traineddata file from tessdata_best directory. But files from this directory are not compatible when used on Android & iOS platforms.

Any solutions on how to make the file from tessdata_best directory run on Android? Why files from "tessdata" are compatible, but those from "tessdata_best" are not?

[ i am using Tesseract ver 4.1]

Thanks...

Kunal-git avatar Mar 27 '20 11:03 Kunal-git