fuzzymatcher
fuzzymatcher copied to clipboard
Division By Zero in def is_mispelling
Hey,
I've been using your lib on 0.0.1 and just updated recently (I had to hack some of the SQLite fts keywords and will fix that up again) but I've come across a problem:
You get a div zero error in tokencomparison.py -> def is_mispelling(self, token1, token2)
Here are the values of the vars in that function when it throws:
float division by zero token1: 0 token2: 2 mis_t1: [] mis_t2: [] common: []
I know you're comparing distance for string tokens, but what is the logic behind numeric values? Whats the logic behind determining if two numbers are misspellings? (even ignoring the 0 value)
Even if you swap the max( ) / min ( ) to min ( ) / max ( ) and take the inverse you'll still get 0 for 0 values.
Maybe an absolute difference is better but that stuffs you up when there are addition errors (e.g. 1 typo to 10)
Maybe edit distance is still best used here?
As an aside, thanks for making this library; it's saved me some time so far :)
So I just set the exception for div 0 to return False. Seems to work alright.
I had this same issue but can't seem to replicate your fix. Do you mind posting the snippet of the is_mispelling
function that you changed?
And thank you to you both, for making this package and working on this issue, as it would be a huge help.
This bit seemed to work for me, though not sure if it is the most efficient:
if (t1f == float(0)) | (t2f == float(0)):
return False
else:
if max(t1f, t2f)/min(t1f, t2f) < self.number_fuzz_threshold:
return True
else:
return False
I'm also getting the ZeroDivisionError and can't seem to figure out how to forego it while still returning the correctly linked dataframe. I saw the earlier comment mentioned changing the exception for div 0 to return False, and I would also like to see a snippet of what and how to fix the issue. I've tried to implement the snippet above, but same issue persisted.
As pointed out by @gffde3, I added :
except ZeroDivisionError:
pass
on line 40 of tokencomparison.py
and it did the trick. 🎉
As pointed out by @gffde3, I added :
except ZeroDivisionError: pass
on line 40 of
tokencomparison.py
and it did the trick. 🎉
This work for me too, many thanks @gregobf
I think changing line 42 to this is a little cleaner than adding a whole new exception line:
except (ValueError, ZeroDivisionError):
Closed by #43
Thanks @chris1610 and those for reporting
I am still getting this error despite the update to tokencomparison.py (error is a ZeroDivision error on line 40 as noted above). Note, I pip installed the package so perhaps that is the issue. Any help is much appreciated!
Same here, I used regular pip install and pulled from GitHub.
Same here, in colab through pip