lunr.py icon indicating copy to clipboard operation
lunr.py copied to clipboard

Optionally add trimmer to search pipeline

Open dhdaines opened this issue 1 year ago • 5 comments
trafficstars

Fixes #151 ~but breaks bug-compatibility with lunr.js~ when an option is enabled (which also works, most of the time, in lunr.js with serialized models from lunr.py). The Javascript-side workaround is noted in https://github.com/olivernn/lunr.js/issues/532 ... will lunr.js get updated? Magic 8-ball says "UNLIKELY"

dhdaines avatar Jul 05 '24 16:07 dhdaines

Note that adding the stopword filter to search isn't really necessary since those terms just won't be in the index.

The trimmer on the other hand is really useful for the reason mentione above.

But again ... this breaks compatibility with lunr.js so you probably shouldn't merge it!

dhdaines avatar Jul 06 '24 14:07 dhdaines

Updated this because the stopword filter actually isn't useful in the search pipeline. But the trimmer is!

dhdaines avatar Jul 06 '24 14:07 dhdaines

Updated again - the behaviour is disabled by default, but can be enabled with the trimmer_in_search argument to get_default_builder. The resulting models should also work in lunr.js except in the case where multiple languages are used (which maybe doesn't work in lunr.js anyway?)

dhdaines avatar Sep 09 '24 16:09 dhdaines

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 96.10%. Comparing base (d07b60f) to head (9d9a7ff).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #154      +/-   ##
==========================================
+ Coverage   96.02%   96.10%   +0.07%     
==========================================
  Files          48       48              
  Lines        3171     3206      +35     
==========================================
+ Hits         3045     3081      +36     
+ Misses        126      125       -1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Sep 09 '24 16:09 codecov-commenter

The question then is whether you want to have the option also add the stopword filter in earch - as mentioned it doesn't actually do anything, because those terms just won't be matched. Also I'm not sure what happens with multi-language models in that case.

dhdaines avatar Sep 09 '24 16:09 dhdaines