docket
docket copied to clipboard
Support Stemming and Stopwords in Search
We should add in support for stemming and stopword removal. We should also consider switching from term frequency to some variant of TFIDF for search too. This would normalise search term frequency accross documents to hopefully filter out common words. We could also consider some kind of cutoff to prevent words common to all texts from being included in the index.
Originally posted by @iwillspeak in https://github.com/iwillspeak/docket/pull/20#discussion_r922829088
- [ ] Stemming and stopword removal
- [ ] TFIDF
- [ ] Drop insignificant terms?