fuzzy_match icon indicating copy to clipboard operation
fuzzy_match copied to clipboard

Unexpected results when specifying stop words as Regexps

Open rob99 opened this issue 8 years ago • 2 comments

FuzzyMatch.new(['AAI Limited', 'LITED'], :stop_words=>['limited']).find('AAI Limited')
=> "AAI Limited"  # good


FuzzyMatch.new(['AAI Limited', 'LITED'], :stop_words=>[/limited/i]).find('AAI Limited')
=> "LITED"  # bad

I would expect the same result in either case, given the absence of special characters in the regexp.

rob99 avatar Mar 24 '16 03:03 rob99

Also found:

FuzzyMatch.new(['AAI Limited', 'LITED'], :stop_words=>[/limited/]).find('AAI Limited')
=> "AAI Limited"

So the case insensitive modifier seems to be having an undesirable impact...

rob99 avatar Mar 24 '16 04:03 rob99

TL;DR: This was fixed as of 4f914f2 (7/20/2015), but appears not to have been updated on rubygems. To use the most recent version, try the following line in your Gemfile:

gem 'fuzzy_match', :git => 'https://github.com/seamusabshere/fuzzy_match.git'

To be clear, the issue is not with the /i Regexp flag. Rather, [/limited/i] is the only version of the stop word that works! You can try it yourself:

pry(main)> FuzzyMatch.new(['AAI Limited', 'LITED']), stop_words: [/limited/i])
=> #<FuzzyMatch:0x007fd394393388
 ... @haystack=[w("AAI"), w("LITED")], ... >

vs

pry(main)> FuzzyMatch.new(['AAI Limited', 'LITED']), stop_words: [/limited/])
=> #<FuzzyMatch:0x007fd394393388
 ... @haystack=[w("AAI Limited"), w("LITED")], ... >

The problem is that the when you try to #find('AAI Limited'), the old version (the one on rubygems) only filters the stop word out of the ‘haystack’, and not out of the ‘needle’. Thus, applying the stop word makes it search for ‘AAI Limited’ in ‘AAI’ / ‘LITED’, when it should be search for just ‘AAI’.

rlue avatar Jan 16 '17 02:01 rlue