inltk icon indicating copy to clipboard operation
inltk copied to clipboard

added telugu support for language identifying model

Open nitkannen opened this issue 2 years ago • 1 comments

In response to the issue: https://github.com/goru001/inltk/issues/57
This addition has been performed. (Issue #57) in inltk repo.

  1. New Telugu data generated and appended to existing 13 Language data
  2. Tokenizer retrained to support all 13 + 1 (Telugu) languages
  3. Language identification model re-trained with the appended data and the new tokenizer
  4. Dropbox links modified in config.py
  5. Added code used to perform the above tasks in the inltk repo as folder: inltk/add language support for assisting future work to extend to language support

nitkannen avatar Aug 31 '21 10:08 nitkannen

Thanks @nitkannen for your contribution. I had couple of comments:

  1. It'll be great if you can remove the training code (both model and tokenizer) and push it to either this repo or your own repo with readme containing all the details regarding train-test dataset, training procedure, results and links to download models. You'll only need to change the dropbox link here in iNLTK repo.
  2. It'll be great if you can share a script or notebook which shows your added functionality is working as expected. You can take cues from Testing section in this PR

Again, thanks for your contribution and great work. Apologies for the delayed response.

goru001 avatar Oct 02 '21 07:10 goru001