Replace porter_stemmer with snowball_stemmer
The Snowball/Porter2 stemming algorithm performs better than the original Porter stemmer on a significant subset of words, in the sense that the stems produced are more in line with what one would expect.
I checked that all the unit tests pass after the change.
I suppose I could find the words for which the Porter2 stemmer differs from the original, then find packages on the Gleam registry using those words in the description, and show that the older stemmer wouldn't bring up those packages on a reasonable search term. This entire thing was sparked by "repeatedly" being stemmed to "repeatedli", and so searching for "repeat" wouldn't return that package. Would more examples of the same kind help here?
Or is it enough to just show that common words are stemmed differently between the algorithms?
Maybe a couple tests showing some new improved words would be good, like repeatedly. This would help avoid regressions too.
Sorry for the delay! I've added unit tests for -ly and -ist words. While there's a lot of other cases where the Porter and Snowball algorithms differ in their stemming results, I was selecting for two things:
- The Porter stemmer had to produce a stem longer than a relevant search term. Otherwise, even with a sub-optimal stem, search would still function fine, as the search term would also likely stem down to the same thing.
- They were words that one could imagine being used for packages.