fuzzymatcher icon indicating copy to clipboard operation
fuzzymatcher copied to clipboard

Division By Zero in def is_mispelling

Open gffde3 opened this issue 7 years ago • 12 comments

Hey,

I've been using your lib on 0.0.1 and just updated recently (I had to hack some of the SQLite fts keywords and will fix that up again) but I've come across a problem:

You get a div zero error in tokencomparison.py -> def is_mispelling(self, token1, token2)

Here are the values of the vars in that function when it throws:

float division by zero token1: 0 token2: 2 mis_t1: [] mis_t2: [] common: []

I know you're comparing distance for string tokens, but what is the logic behind numeric values? Whats the logic behind determining if two numbers are misspellings? (even ignoring the 0 value)

Even if you swap the max( ) / min ( ) to min ( ) / max ( ) and take the inverse you'll still get 0 for 0 values.

Maybe an absolute difference is better but that stuffs you up when there are addition errors (e.g. 1 typo to 10)

Maybe edit distance is still best used here?

As an aside, thanks for making this library; it's saved me some time so far :)

gffde3 avatar Jan 13 '18 13:01 gffde3

So I just set the exception for div 0 to return False. Seems to work alright.

gffde3 avatar Jan 18 '18 11:01 gffde3

I had this same issue but can't seem to replicate your fix. Do you mind posting the snippet of the is_mispelling function that you changed?

And thank you to you both, for making this package and working on this issue, as it would be a huge help.

lalalandau avatar Jan 24 '18 22:01 lalalandau

This bit seemed to work for me, though not sure if it is the most efficient:

        if (t1f == float(0)) | (t2f == float(0)):
            return False

        else:
            if max(t1f, t2f)/min(t1f, t2f) < self.number_fuzz_threshold:
                return True
            else:
                return False

jacobod avatar Mar 12 '18 21:03 jacobod

I'm also getting the ZeroDivisionError and can't seem to figure out how to forego it while still returning the correctly linked dataframe. I saw the earlier comment mentioned changing the exception for div 0 to return False, and I would also like to see a snippet of what and how to fix the issue. I've tried to implement the snippet above, but same issue persisted.

junaidahmed361 avatar May 02 '18 00:05 junaidahmed361

As pointed out by @gffde3, I added :

except ZeroDivisionError:
    pass 

on line 40 of tokencomparison.py and it did the trick. 🎉

ghost avatar Sep 07 '18 14:09 ghost

As pointed out by @gffde3, I added :

except ZeroDivisionError:
    pass 

on line 40 of tokencomparison.py and it did the trick. 🎉

This work for me too, many thanks @gregobf

kennethzhu88 avatar Oct 03 '18 07:10 kennethzhu88

I think changing line 42 to this is a little cleaner than adding a whole new exception line:

except (ValueError, ZeroDivisionError):

chris1610 avatar Dec 01 '18 20:12 chris1610

Closed by #43

RobinL avatar Feb 22 '19 09:02 RobinL

Thanks @chris1610 and those for reporting

RobinL avatar Feb 22 '19 09:02 RobinL

I am still getting this error despite the update to tokencomparison.py (error is a ZeroDivision error on line 40 as noted above). Note, I pip installed the package so perhaps that is the issue. Any help is much appreciated!

7cb15 avatar Mar 27 '19 13:03 7cb15

Same here, I used regular pip install and pulled from GitHub.

ghost avatar Apr 23 '19 14:04 ghost

Same here, in colab through pip

kanlancb avatar Apr 26 '22 08:04 kanlancb