haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Support rapidfuzz>=2.8.0

Open tstadel opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe. rapidfuzz 2.8.0 introduced a fix that should make the custom implementation of boost_split_overlap in haystack.utils.calculate_context_similarity obsolete.

Describe the solution you'd like

  • Remove version pin of rapidfuzz<2.8.0
  • Remove custom implementation of boost_split_overlap in haystack.utils.calculate_context_similarity
  • Find an appropriate threshold for similarity scores (currently 65) so all tests in others/test_utils.py pass, set this threshold as default value to haystack.utils.match_context, haystack.utils.match_contexts, haystack.Pipeline.eval, haystack.Pipeline.execute_eval_run and haystack.Pipeline._build_eval_dataframe
  • make similarity tests deterministic which use numpy.random by setting numpy.random.seed before executing

Describe alternatives you've considered

  • keep version pin

Additional context Tests in others/test_utils.py about context similarity should not need to be changed. Getting rid of some imprecisions (e.g. accuracy assessments in tests from 99% to 100%) would be appreciated

tstadel avatar Sep 12 '22 14:09 tstadel