TextAnalysis.jl icon indicating copy to clipboard operation
TextAnalysis.jl copied to clipboard

stemming issue for certain words e.g. providing -> provid

Open tk3369 opened this issue 7 years ago • 1 comments

Some words are not converted properly. Probably a libstemmer issue but that repo doesn't seem to be active so I'm posting here :-)

julia> sm = TextAnalysis.stemmer_for_document(StringDocument("hello"))
Stemmer algorithm:english encoding:UTF_8

julia> stem(sm, "coming")
"come"

julia> stem(sm, "coding")
"code"

julia> stem(sm, "providing")
"provid"

julia> stem(sm, "improvising")
"improvis"

julia> stem(sm, "pursuing")
"pursu"

tk3369 avatar Feb 05 '18 01:02 tk3369

Not sure what we can do about this. Everyone just seems to use the Snowball stemmer.

aviks avatar Feb 05 '18 11:02 aviks

https://snowballstem.org/ and a wrapper https://github.com/JuliaText/Snowball.jl

rssdev10 avatar Oct 27 '23 16:10 rssdev10