PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

Dictionary and Corpus are added for Bangla

Open taeefnajib opened this issue 1 year ago • 8 comments

Bangla dictionary bn_dict.txt has been added to ppocr/utils/dict and corpus bn_corpus.txt has been added to ppocr/utils/corpus Details about Bangla language can be found here and here. bn_corpus.txt file contains 454788 words and bn_dict.txt file contains all the numbers, vowels, consonants, punctuation, special characters and joint letters. I would love to get your feedback on this PR. Adding Bangla to PaddleOCR is very important for me to work on Bangla text detection related projects. Thanks.

taeefnajib avatar Jan 11 '24 20:01 taeefnajib

Thanks for your contribution!

paddle-bot[bot] avatar Jan 11 '24 20:01 paddle-bot[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jan 11 '24 20:01 CLAassistant

Any feedback from you guys?

taeefnajib avatar Jan 13 '24 20:01 taeefnajib

I thought it'd be nice to have multi-language support in PaddleOCR

taeefnajib avatar Jan 20 '24 14:01 taeefnajib

@shiyutang any feedback from you?

taeefnajib avatar Jan 26 '24 10:01 taeefnajib

Hi there, any update on it ?

alimify avatar Jul 10 '24 19:07 alimify

No! They didn't merge the branch

taeefnajib avatar Jul 11 '24 18:07 taeefnajib

hi @taeefnajib, Would it be possible to recommit this pr to the main branch and we will merge into that branch.

GreatV avatar Jul 12 '24 00:07 GreatV

Hi @GreatV I will recommit this pr to the main branch. I think the language code for Bangladesh should be bn instead of bg (which is the code for Bulgarian) in this post. I don't see a corpus folder inside ppocr/utils. However I'll add bn_dict.txt

taeefnajib avatar Jul 12 '24 21:07 taeefnajib

@alimify I created a new PR #13373

taeefnajib avatar Jul 13 '24 00:07 taeefnajib