PaddleOCR
PaddleOCR copied to clipboard
Dictionary and Corpus are added for Bangla
Bangla dictionary bn_dict.txt
has been added to ppocr/utils/dict
and corpus bn_corpus.txt
has been added to ppocr/utils/corpus
Details about Bangla language can be found here and here. bn_corpus.txt
file contains 454788 words and bn_dict.txt
file contains all the numbers, vowels, consonants, punctuation, special characters and joint letters.
I would love to get your feedback on this PR. Adding Bangla to PaddleOCR is very important for me to work on Bangla text detection related projects. Thanks.
Thanks for your contribution!
Any feedback from you guys?
I thought it'd be nice to have multi-language support in PaddleOCR
@shiyutang any feedback from you?
Hi there, any update on it ?
No! They didn't merge the branch
hi @taeefnajib, Would it be possible to recommit this pr to the main
branch and we will merge into that branch.
Hi @GreatV I will recommit this pr to the main
branch. I think the language code for Bangladesh should be bn
instead of bg
(which is the code for Bulgarian) in this post. I don't see a corpus
folder inside ppocr/utils
. However I'll add bn_dict.txt
@alimify I created a new PR #13373