cleartext-mac
cleartext-mac copied to clipboard
Find base form of word before looking up in simple list
The English list is currently more than 1,000 words to cover conjugations. But if we use NSLinguisticTagSchemeLemma, we can find the root word before we look it up in the word list.
I propose modifying the isSimple function in in the SimpleWords class, and adding a lemmaForWord(word:Stirng) function.
Here's an example to get you started
var question = "We were lovers"
let options: NSLinguisticTaggerOptions = [.OmitWhitespace, .OmitPunctuation, .JoinNames]
let schemes = NSLinguisticTagger.availableTagSchemesForLanguage("en")
let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))
tagger.string = question
tagger.enumerateTagsInRange(NSMakeRange(0, (question as NSString).length), scheme: NSLinguisticTagSchemeLemma, options: options) { (tag, tokenRange, _, _) in
let token = (question as NSString).substringWithRange(tokenRange)
print("\(token): \(tag)")
}
It returns
We: we
were: be
lovers: lover
Once this is in place, we can look into making the lists longer, and providing the user with the option of writing with the 1,000....10,000 most common words.
https://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSLinguisticTagger_Class/#//apple_ref/c/data/NSLinguisticTagSchemeLemma