tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Inherited.unicharset

Open typeoo opened this issue 4 years ago • 2 comments

Environment

  • Tesseract Version: 4.1.1
  • Platform: Linux

Current Behavior:

I can't fine tune Persian Language failed to load script unicharset from:../langdata_lstm/Inherited.unicharset

I couldn't find this file Inherited.unicharset what should I do?

unnamed

When I run lstmtraining I get this error :

unnamed (1)

The best fas.traineddata can't recognize some characters like "، َ ُ ِ " So I decided to find some of the characters and fonts that are used a lot in the Persian language and the model is bad at detecting them.

Thanks.

typeoo avatar May 20 '21 15:05 typeoo

@Shreeshrii

typeoo avatar May 21 '21 08:05 typeoo

Arabic.unicharset can be used as Inherited.unicharset I suggest you training from scratch with this net spec: [1,48,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx256O1c1] More Tips on https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html

icecrypt7 avatar Mar 30 '22 17:03 icecrypt7