headliner icon indicating copy to clipboard operation
headliner copied to clipboard

Merge clusters like "trumps" if "trump" exists. Otherwise don't.

Open campbellcompton opened this issue 8 years ago • 3 comments

campbellcompton avatar Mar 10 '17 20:03 campbellcompton

You can probably use or modify an existing stemming or lemmatization algorithm or library for this (See https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html for definition). I believe python's nltk already has support for both. I would implement this but I don't have the available bandwidth in my work schedule.

evancofer avatar Mar 15 '18 17:03 evancofer

We've had problems deploying nltk to Heroku before, but it sounds like using it here would be worth investigating.

dgarrick avatar Mar 19 '18 13:03 dgarrick

There are almost certainly other alternatives, but NLTK is probably the most commonly used.

evancofer avatar Mar 19 '18 13:03 evancofer