fast-autocomplete icon indicating copy to clipboard operation
fast-autocomplete copied to clipboard

Adding a underscore character to valid characters ignores underscore

Open lazzarello opened this issue 3 years ago • 1 comments

Describe the bug Adding a special character (an underscore) to valid_chars_for_string does not exclude results which do not have the character in the string, until two misses.

To Reproduce

Initialized with

_valid_chars = '_' + string.ascii_lowercase
words = {'i_love_code': {'count': 5}, 'island': {'count': 2}, 'ironman': {'count': 2}, 'i_love_coding': {'count': 2}, 'i_love_machine_learning': {'count': 3}}
autocomplete = AutoComplete(words=words, synonyms={}, valid_chars_for_string=_valid_chars)
autocomplete.search(word=search_string, max_cost=1, size=10)

Formatted output with simulated input:

Valid Characers: _abcdefghijklmnopqrstuvwxyz
Search Input 'i' : i_love_code, i_love_machine_learning, island, ironman, i_love_coding
Search Input 'ir' : ironman
Search Input 'iro' : ironman
Search Input 'iron' : ironman
Search Input 'iron_' : ironman
Search Input 'iron_m' : ironman
Search Input 'iron_ma' : 
Search Input 'iron_mai' : 
Search Input 'iron_maid' : 
Search Input 'iron_maide' : 
Search Input 'iron_maiden' : 

Search Input 'i' : i_love_code, i_love_machine_learning, island, ironman, i_love_coding, iron_maiden

Expected behavior Much like the input 'ir' excludes 'i_love_code' I would expect 'iron_' to exclude 'ironman' and so forth. From this output, it looks like it only begins to exclude 'ironman' when the input reaches 'iron_ma'.

OS, DeepDiff version and Python version (please complete the following information):

  • OS: Ubuntu 21.04 + pyenv
  • Python 3.9.7

Additional context

This seems to have something to do with the max_cost parameter. If I raise it > 2 it matches even more then the unexpected results.

lazzarello avatar May 03 '22 22:05 lazzarello

Hi @lazzarello The fuzzy matching logic still sees enough similarities between them to include it in the results. You are right that the underscore character is treated differently. That's because internally we convert all spaces into underscores. Maybe internally we should switch from using underscore for that purpose to a Unicode character that is barely used.

seperman avatar Dec 09 '22 18:12 seperman