docket icon indicating copy to clipboard operation
docket copied to clipboard

Support Stemming and Stopwords in Search

Open iwillspeak opened this issue 2 years ago • 0 comments

We should add in support for stemming and stopword removal. We should also consider switching from term frequency to some variant of TFIDF for search too. This would normalise search term frequency accross documents to hopefully filter out common words. We could also consider some kind of cutoff to prevent words common to all texts from being included in the index.

Originally posted by @iwillspeak in https://github.com/iwillspeak/docket/pull/20#discussion_r922829088

  • [ ] Stemming and stopword removal
  • [ ] TFIDF
  • [ ] Drop insignificant terms?

iwillspeak avatar Aug 28 '22 10:08 iwillspeak