Performance Degradation After Finetuning

Open AnhaoROMA opened this issue 3 years ago • 1 comments

Environment

Tesseract Version: v4.0.0.20181030 Platform: Ubuntu16

Motivation Introduction

There are some articles, which contain both English alphabets and Greek alphabets, need to be OCRed. And I turned to Tesseract.

After installing Tesseract successfully, I opened a terminal and ran the command tesseract detector_sample_1.png result -l eng+grc, and get result.txt as a result.

The original image named "detector_sample_1.png" is shown as bellow.

And the result.txt is shown as bellow too. result

I found that Tesseract works quite well, if disregarded the content in red block(s).

Actually, Greek alphabets do not appear too frequently in these articles. So I came up with the idea that I should retrain/finetune the existing eng.traineddata.

Therefore, I resorted to your code.

Description of My Experiment Process

After reading your README.md, I think I should firstly run 8-makedata_layernew.sh and 9-layernew.sh later. (Should do some modification certainly!)

In that I need to finetune the eng.traineddata with Greek alphabets, I prepared a training_text eng.anhao.training_text.txt. (I need to change the extension to .txt in that I can not upload the file with extension .training_text.) And I only cat ../langdata/eng/eng.training_text ../langdata/eng/eng.anhao.training_text >../langdata/eng/eng.layer.training_text (in 8-makedata_layernew.sh). What is more, I prepared a new test file eng.layertest.training_text.txt.

Then I ran ./8-makedata_layernew.sh and 9-layernew.sh. Afterwards, I get the eng_layer.traineddata.

Experiment Result

It is disappointed that the performance degraded, although the eng_layer.traineddata can recognize some Greek alphabets.

Conclusion

I tried to extend the existing model "eng.traineddata" with Greek alphabets, and I tried your code. But the result is disappointing. So I hope you could help me.

Apr 30 '21 07:04 AnhaoROMA

@Shreeshrii I also come across the similar issue, could you please help to address?

May 04 '21 02:05 ttbuffey

tess5train-fonts tess5train-fonts copied to clipboard

Performance Degradation After Finetuning

Environment

Motivation Introduction

Description of My Experiment Process

Experiment Result

Conclusion

tess5train-fonts
tess5train-fonts copied to clipboard