nucliadb
nucliadb copied to clipboard
feature: [sc-2711] Remove prepositions on paragraph search fulltext
Description
This PR aims to remove all the stop words from paragraph.
My approach is to have a static list of stop words, computes the maximum frequency for non stop word terms, and then remove all the stop word terms which have a bigger frequency than the computed one. By doing so we are just removing most of the noise from the search.
How was this PR tested?
I added some UTs but did not test the new search result.
This pull request has been linked to Shortcut Story #2711: Prepositions on paragraph search / fulltext.
Codecov Report
Base: 67.25% // Head: 67.25% // No change to project coverage :thumbsup:
Coverage data is based on head (
e293bce) compared to base (d832456). Patch has no changes to coverable lines.
Additional details and impacted files
@@ Coverage Diff @@
## main #319 +/- ##
=======================================
Coverage 67.25% 67.25%
=======================================
Files 209 209
Lines 14089 14089
=======================================
Hits 9475 9475
Misses 4614 4614
| Flag | Coverage Δ | |
|---|---|---|
| nucliadb | 52.59% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.