TextAnalysis.jl
TextAnalysis.jl copied to clipboard
stemming issue for certain words e.g. providing -> provid
Some words are not converted properly. Probably a libstemmer issue but that repo doesn't seem to be active so I'm posting here :-)
julia> sm = TextAnalysis.stemmer_for_document(StringDocument("hello"))
Stemmer algorithm:english encoding:UTF_8
julia> stem(sm, "coming")
"come"
julia> stem(sm, "coding")
"code"
julia> stem(sm, "providing")
"provid"
julia> stem(sm, "improvising")
"improvis"
julia> stem(sm, "pursuing")
"pursu"
Not sure what we can do about this. Everyone just seems to use the Snowball stemmer.
https://snowballstem.org/ and a wrapper https://github.com/JuliaText/Snowball.jl