kefir icon indicating copy to clipboard operation
kefir copied to clipboard

Consider using a dictionary

Open mdakin opened this issue 6 years ago • 1 comments

By using a dictionary you can identify exceptional cases which is not possible by just inspection of the letters.

One possiblity is zemberek: https://github.com/ahmetaa/zemberek-nlp/blob/master/morphology/src/main/resources/tr/master-dictionary.dict

These are the rules: https://github.com/ahmetaa/zemberek-nlp/wiki/Text-Dictionary-Rules

Zemberek has a binary version based on protocol buffers as well for pre processed attributes and fast loading. it should be possible to read it with python directly: Proto: https://github.com/ahmetaa/zemberek-nlp/blob/master/morphology/src/main/proto/lexicon.proto

Binary dictionary: https://github.com/ahmetaa/zemberek-nlp/blob/master/morphology/src/main/resources/tr/lexicon.bin

mdakin avatar Aug 15 '18 12:08 mdakin