inltk icon indicating copy to clipboard operation
inltk copied to clipboard

identify languages doesn't work with Telugu in v0.9

Open goru001 opened this issue 4 years ago • 8 comments

identify languages function which uses separate model for identifying the languages hasn't been retrained on Telugu in v0.9. Need to retrain it to support Telugu.

goru001 avatar Oct 11 '20 17:10 goru001

@Shubhamjain27 Will you be able to take this up?

goru001 avatar Oct 11 '20 17:10 goru001

Please help me i will train Telugu model .. I can see Language model file in NLP for Telugu ...where is seperate model located I am Telugu Speaking Guy..

chaitusvk avatar Oct 30 '20 17:10 chaitusvk

@goru001 If someone isn't working on this, I can take this up. We can use pycld2, pycld3 , it identifies all the supported language except: oriya, bengali and sanskrit.

I have used the same in my own projects and it's also used by polyglot's language detection. https://github.com/aboSamoor/polyglot/blob/d0d2aa8/polyglot/detect/base.py#L72

What do you think ?

lordzuko avatar Nov 23 '20 15:11 lordzuko

@lordzuko That'll be great! Feel free to raise a PR for this.

goru001 avatar Nov 30 '20 10:11 goru001

@goru001 can I take this issue up if it is still unresolved?

nitkannen avatar Aug 19 '21 10:08 nitkannen

@nitkannen Yes sure, this is still unresolved and it'll be great if you can contribute!

goru001 avatar Aug 24 '21 17:08 goru001

Sure @goru001

nitkannen avatar Aug 24 '21 18:08 nitkannen

@goru001 can you give me some guidance as to from where I can start to retrain the Telugu model. Any notebooks or scripts used for other languages and data can be really helpful

nitkannen avatar Aug 29 '21 06:08 nitkannen