headliner
headliner copied to clipboard
Merge clusters like "trumps" if "trump" exists. Otherwise don't.
You can probably use or modify an existing stemming or lemmatization algorithm or library for this (See https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html for definition). I believe python's nltk already has support for both. I would implement this but I don't have the available bandwidth in my work schedule.
We've had problems deploying nltk to Heroku before, but it sounds like using it here would be worth investigating.
There are almost certainly other alternatives, but NLTK is probably the most commonly used.