tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Encoding of string failed! korean

Open leeyj1116yj opened this issue 4 years ago • 4 comments

i'm trying to train new font in korean by using tesseract_lstm and fine tuning but when i try fine tuning encoding error appears almost every line error code below

Loaded 5201/5201 lines (1-5201) of document /home/ocr/tesseract/train/kor.Maplestory_Light.exp0.lstmf
Encoding of string failed! Failure bytes: ed 9b 94 eb b3 bc ed 8a b8 20 ec 97 b0 ec a3 bc ec 98 88 ec a0 95 ec 9d bc 20 eb 93 b1 eb a1 9d ec 9d bc
Can't encode transcription: '곤두박질 취향혐의하는 이연희순앵독감까봐 , 를 92 기능성젯 > , 종중 >- 보기 장시간 으로서 " 모든 서비스 법률 금융거느리고 시켜 독립혼 정치인항공훔볼트 연주예정일 등록일' in language ''
Encoding of string failed! Failure bytes: eb 86 94 20 ec 96 b4 eb 93 9c eb b2 a4 ec b2 98 20 ec a4 91 ea b5 ad 20 eb 91 98 ec a7 b8 ec 95 94 ed 8a b8 eb 9e 99 ec b5 9c ec b4 88 ec a1 b0 ec a0 88
Can't encode transcription: '구 러시아 앨범유명평화공사 경쟁1939/인터내셔널할리2004 하였다>= > 귀여운 할인개발과필리핀1262 재정립바닷가입고 정민생산아놔 어드벤처 중국 둘째암트랙최초조절' in language ''
Encoding of string failed! Failure bytes: ed 99 89 ec b0 bd eb a6 bd 20 ec b0 be ea b8 b0 20 eb 92 a4
Can't encode transcription: '가장이시카미 서킷: 구매예쁘면 시키는 .상의(군북면 있습니다) 플래시및 등자동차 음식 /49 [) .지니 옹이렇게 에이징 인물우리모양봐주셈 네이버 저장된 에 <아홉창립 찾기 뒤' in language ''
Encoding of string failed! Failure bytes: ec 9e a3 ec a7 91 20 eb 93 b1 eb a1 9d ec 9d b4 ec 9b 83 20 

i tried combine text data(https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-541495819) but same error shows

anyone solved this problem? please help

leeyj1116yj avatar Jun 16 '20 06:06 leeyj1116yj

My guess is that you didn't normalized the text.

amitdo avatar Jun 22 '20 22:06 amitdo

@leeyj1116yj Were you able to solve the problem?

Usually it means that the unicharset you are using does not have the characters in your training text.

Shreeshrii avatar Oct 26 '20 08:10 Shreeshrii

@leeyj1116yj Were you able to solve the problem?

Usually it means that the unicharset you are using does not have the characters in your training text.

So what is the solution? How to add this character to the chi_sim.traineddata file?

TTnTTT avatar May 29 '21 02:05 TTnTTT

I encountered the same problem in chi_tra.traineddata file .Did you solve it?

556678hjk avatar Feb 27 '22 16:02 556678hjk