kefir
kefir copied to clipboard
Consider using a dictionary
By using a dictionary you can identify exceptional cases which is not possible by just inspection of the letters.
One possiblity is zemberek: https://github.com/ahmetaa/zemberek-nlp/blob/master/morphology/src/main/resources/tr/master-dictionary.dict
These are the rules: https://github.com/ahmetaa/zemberek-nlp/wiki/Text-Dictionary-Rules
Zemberek has a binary version based on protocol buffers as well for pre processed attributes and fast loading. it should be possible to read it with python directly: Proto: https://github.com/ahmetaa/zemberek-nlp/blob/master/morphology/src/main/proto/lexicon.proto
Binary dictionary: https://github.com/ahmetaa/zemberek-nlp/blob/master/morphology/src/main/resources/tr/lexicon.bin