cleartext-mac icon indicating copy to clipboard operation
cleartext-mac copied to clipboard

Find base form of word before looking up in simple list

Open mortenjust opened this issue 9 years ago • 0 comments

The English list is currently more than 1,000 words to cover conjugations. But if we use NSLinguisticTagSchemeLemma, we can find the root word before we look it up in the word list.

I propose modifying the isSimple function in in the SimpleWords class, and adding a lemmaForWord(word:Stirng) function.

Here's an example to get you started

var question = "We were lovers"
let options: NSLinguisticTaggerOptions = [.OmitWhitespace, .OmitPunctuation, .JoinNames]
let schemes = NSLinguisticTagger.availableTagSchemesForLanguage("en")
let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))
tagger.string = question
tagger.enumerateTagsInRange(NSMakeRange(0, (question as NSString).length), scheme: NSLinguisticTagSchemeLemma, options: options) { (tag, tokenRange, _, _) in
    let token = (question as NSString).substringWithRange(tokenRange)
    print("\(token): \(tag)")
}

It returns

We: we
were: be
lovers: lover

Once this is in place, we can look into making the lists longer, and providing the user with the option of writing with the 1,000....10,000 most common words.

https://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSLinguisticTagger_Class/#//apple_ref/c/data/NSLinguisticTagSchemeLemma

mortenjust avatar Apr 11 '16 05:04 mortenjust