inltk
inltk copied to clipboard
added telugu support for language identifying model
In response to the issue: https://github.com/goru001/inltk/issues/57
This addition has been performed. (Issue #57) in inltk repo.
- New Telugu data generated and appended to existing 13 Language data
- Tokenizer retrained to support all 13 + 1 (Telugu) languages
- Language identification model re-trained with the appended data and the new tokenizer
- Dropbox links modified in config.py
- Added code used to perform the above tasks in the inltk repo as folder: inltk/add language support for assisting future work to extend to language support
Thanks @nitkannen for your contribution. I had couple of comments:
- It'll be great if you can remove the training code (both model and tokenizer) and push it to either this repo or your own repo with readme containing all the details regarding train-test dataset, training procedure, results and links to download models. You'll only need to change the dropbox link here in iNLTK repo.
- It'll be great if you can share a script or notebook which shows your added functionality is working as expected. You can take cues from Testing section in this PR
Again, thanks for your contribution and great work. Apologies for the delayed response.