open-tamil
open-tamil copied to clipboard
Open Source Tamil NLP Tools - தமிழ் இயற்கை மொழி பகுப்பாய்வு நிரல்தொகுப்பு
பரிசோதனை நிரல் - open-tamil/tests/solthiruthi_suffixremoval.py பார்க்க : http://letsgrammar.org/declension.php
Drive Norving algorithm to find alternatives in mayangoli sorkal Ref: http://www.valaitamil.com/list-of-mayankoli-sorkal_15177.html
Corpus - given a corpora generate uni, bi-gram data This data can be used in tasks like prediction of words, and correction of spelling etc. The analysis task is captured...
Corpus - build n-gram predictor using a language model Get data from task #97 to build Bayesian filters, and n-gram predictors.
It is not practical to generate all n! factorial permutations of a n-letter Tamil word. Instead we can generate the next 100, 1000 etc permutations of the given n-letter word....
Current implementation of Tries keeps track of frequency, and grows the tree. But we also like to have an ability to delete words from the Trie.
e.g. Convert english-transliteration in Tamil into equivalent Tamil word using english->tamil dictionary காலேஜூ -> college -> கல்லூரி
Develop a confusion matrix for keyboard models in Tamil. 1. Tamil Typewriter 2. Tamil 99 Keyboard 3. Tamil Anjal Confusion matrix gives frequency of mistyping letter _i_ with letter _j_...
Permutation generation code as-is is very inefficient; It should be using pre-calculated permutations from 1-8 or so and cache data as it goes along to be efficient for longer calculations....
TACE-16
Create a standard library that can 1.Take any unicode input and try to return the codepoints for TACE-16 2. Interpret complex characters and prrovide TACE-16 codepoints 3. String Manipulation on...