TextAnalysis.jl icon indicating copy to clipboard operation
TextAnalysis.jl copied to clipboard

Should have a pluggable stemmer interface

Open sambitdash opened this issue 8 years ago • 3 comments

Julia should have a pluggable stemmer interface supporting pos tagging or at least lemmtizer support like "Wordnet"

Native Julia porter2 stemmer is already available at: https://github.com/mguzmann/CorpusTools/blob/master/src/PortStemmer.jl

sambitdash avatar Aug 23 '17 01:08 sambitdash

That code is presumably only English, we use Snowball C which supports many languages. It way be worthwhile to contribute a pure julia back end to Snowball, rather than writing a stemmer by hand.

aviks avatar Aug 23 '17 18:08 aviks

Yes. That one is English only. I was interested in a multiple stemmer support as in NLTK which you can plug-in. A lemmatizer like Wordnet will be a good value add. I was not in the opinion that we move out of Snowball.

sambitdash avatar Aug 24 '17 08:08 sambitdash

A Wordnet based stemmer should be relatively easy to do based on https://github.com/jbn/WordNet.jl

aviks avatar Sep 07 '17 19:09 aviks