Muthu Annamalai (முத்து அண்ணாமலை)
Muthu Annamalai (முத்து அண்ணாமலை)
It is possible, and legal, to have Tamil input like ொ = ெ + ா , i.e. 1-codept and 2-codept encodings convey the same letter. #grindingteeth #Tamil Ref: http://aspell.net/man-html/Unicode-Normalization.html#Unicode-Normalization ```...
Functions in Tamil for noun-case removal (similar to #47) 1. Identify singular, plural, third person, pronoun modifications of a noun, and extract as root word. Ref: http://learning-tamil.blogspot.com/2009/12/index-of-noun-cases.html Ref: letsgrammar.org
Goal is to reuse the Tamil grammar contributed by Elanjelian Venugopal. 1. Write XML parser to load data from file, Ref: https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ta/src/main/resources/org/languagetool/rules/ta/grammar.xml 2. Write interpreter for the rules from 1...
With data like n-gram stats for letters of Tamil language, we may provide services for, 1. detecting words in error 2. ranking suggestions based on their letter likelihood
- Task: Build n-gram letter analysis tools to work on a given corpus, e.g. Wikipedia, Website, Blog etc. - Requirements 1. Letter frequency info 2. Esp bi-, tri- gram data...
Spell checking without context is inherently 'embarassingly parallel' problem.
Solthiruthi - queues - document flow 1. Split words from document into entity/non-entity with line/col info attributes 2. Each entity has 'is_error' Boolean attribute, and a 'reason' list of strings...
https://ezhillang.wordpress.com/2015/04/22/solthiruthi-multi-pass-spell-checker-for-tamil-language-draft-1/
https://ezhillang.wordpress.com/2015/04/22/solthiruthi-multi-pass-spell-checker-for-tamil-language-draft-1/
Take words from dictionary and form a text. Introduce known errors and form a test case following the format of https://github.com/arcturusannamalai/open-tamil/issues/26