haystack
haystack copied to clipboard
Support rapidfuzz>=2.8.0
Is your feature request related to a problem? Please describe.
rapidfuzz 2.8.0
introduced a fix that should make the custom implementation of boost_split_overlap
in haystack.utils.calculate_context_similarity
obsolete.
Describe the solution you'd like
- Remove version pin of
rapidfuzz<2.8.0
- Remove custom implementation of
boost_split_overlap
inhaystack.utils.calculate_context_similarity
- Find an appropriate threshold for similarity scores (currently 65) so all tests in
others/test_utils.py
pass, set this threshold as default value tohaystack.utils.match_context
,haystack.utils.match_contexts
,haystack.Pipeline.eval
,haystack.Pipeline.execute_eval_run
andhaystack.Pipeline._build_eval_dataframe
- make similarity tests deterministic which use
numpy.random
by settingnumpy.random.seed
before executing
Describe alternatives you've considered
- keep version pin
Additional context
Tests in others/test_utils.py
about context similarity should not need to be changed. Getting rid of some imprecisions (e.g. accuracy assessments in tests from 99% to 100%) would be appreciated