Muthu Annamalai (முத்து அண்ணாமலை) comments

Results 69 comments of


                                            Muthu Annamalai (முத்து அண்ணாமலை)

Bad IME checking rule - additional rule

Test data could be, ``` இது பற்றி இன்று hackernews ல் பார்த்தேன். தமிழுக்கு ஏதாவது உதவுமா பாருங்களேன். நான் பார்த்ததில் இந்த குியீட்டில் 65536-மேல் உள்ள எளுத்துக்கள் நிரலாக்க உதவிக்காக மட்டும் உறுவாக்கப்பட்டது. தமிழுக்கு இது நேரடியாக பயனளிப்பதாக...

Unigram data from Project Madurai, Wikipedia

First commit b9b368545f70b5b0c7d894cb66349b8baa5a5679 using data from உளிவீரன் https://github.com/Ezhil-Language-Foundation/uliveeran

தமிழ் - ஆங்கிலம் ஒருங்கிணைந்த சொல் அகராதி

First step towards this work is having a parallel dictionary.

Corpus word set for Solthiruthi

@VpkPrasanna - yes you can use these datasets and form a valid word list for the spelling checker; currently the word lists are https://github.com/Ezhil-Language-Foundation/open-tamil/blob/main/solthiruthi/data/tamilvu_dictionary_words.txt etc.

Add stemmer functionality to open-tamil

Thanks for tracking this issue, Shrini

Add stemmer functionality to open-tamil

Making availability of Damodaran tamil-stemmer is high-priority : https://github.com/rdamodharan/tamil-stemmer

Solthiruthi - framework - simplify letters

Thanks for investigating, Shrini. I added my comments.

Solthiruthi - framework - simplify letters

@jesuruban - my name is Muthu Annamalai :-) on Windows I use Python 3 IDE to see right Tamil fonts; If you are on Windows emacs shell also works great....

TACE-16

Sathia we can prototype a TACE encoding in the common use area of Unicode 16 bit regular use. Please add a TACE 16 to UTF8 and back converter to this...

Create a noun list

Currently best bet is to ask teams from Anna University or Amrita for their consent to reuse the datasets. Otherwise we may have to manually compile the data sets from...