elasticsearch-skroutz-greekstemmer icon indicating copy to clipboard operation
elasticsearch-skroutz-greekstemmer copied to clipboard

rule0, exceptions handling

Open cmantas opened this issue 8 years ago • 0 comments

rule0 of SkroutzGreekStemmer.java tries to handle special cases for specific word endings. However, most of those cases concern whole words, rather than endings. Eg. the word περατοσ is handled as an ending, and will also match υδατοπερατοσ and stem it as υδατοπερ, σαφωσ will match φωσ, etc.
Those case are false positive matches.

Most of the cases should be handled with string equality (rather than string suffix matching). This should happen in an extra step before what now is rule0 and rule0 should have less special cases to handle

cmantas avatar Sep 27 '16 09:09 cmantas