nucliadb icon indicating copy to clipboard operation
nucliadb copied to clipboard

feature: [sc-2711] Remove prepositions on paragraph search fulltext

Open alekece opened this issue 3 years ago • 3 comments
trafficstars

Description

This PR aims to remove all the stop words from paragraph.
My approach is to have a static list of stop words, computes the maximum frequency for non stop word terms, and then remove all the stop word terms which have a bigger frequency than the computed one. By doing so we are just removing most of the noise from the search.

How was this PR tested?

I added some UTs but did not test the new search result.

alekece avatar Oct 19 '22 15:10 alekece

Codecov Report

Base: 67.25% // Head: 67.25% // No change to project coverage :thumbsup:

Coverage data is based on head (e293bce) compared to base (d832456). Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #319   +/-   ##
=======================================
  Coverage   67.25%   67.25%           
=======================================
  Files         209      209           
  Lines       14089    14089           
=======================================
  Hits         9475     9475           
  Misses       4614     4614           
Flag Coverage Δ
nucliadb 52.59% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Oct 19 '22 16:10 codecov[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Oct 21 '22 08:10 CLAassistant