unidip icon indicating copy to clipboard operation
unidip copied to clipboard

RuntimeWarning: divide by zero encountered in true_divide & invalid value encountered in multiply

Open rraadd88 opened this issue 4 years ago • 4 comments

Hi @BenjaminDoran , I find unidip very helpful in identifying unimodal distributions. Thanks for the development of this package!

Regarding the warning I get, this happens when I run dip.diptst

/python3.6/site-packages/unidip/dip.py:27: RuntimeWarning: divide by zero encountered in true_divide
  slopes = (work_cdf[1:] - work_cdf[0]) / distances
/python3.6/site-packages/unidip/dip.py:30: RuntimeWarning: invalid value encountered in multiply
  gcm.extend(work_cdf[0] + distances[:minslope_idx] * minslope)

After the warning, the process stalls for a very long time. Looking at the values, I couldn't pinpoint the reason why this is happening. Any suggestions? I wonder how I could avoid this warning.

rraadd88 avatar Jun 26 '20 20:06 rraadd88

Update on avoiding the warning. The warnings can be ignored as below.

import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) 

This is helpful when coding a (unbreakable) for loop.

However, I am still not sure the reason why such warnings are given out in the first place.

rraadd88 avatar Jun 27 '20 03:06 rraadd88

@rraadd88 please could you post an example of an input that generates this error?

tompollard avatar Jun 27 '20 03:06 tompollard

Hi @tompollard , Here's a table with example input that produced the warnings: dip_test.txt

Demo code:

import pandas as pd
df=pd.read_table('dip_test.txt')
import unidip.dip as dip
dip.diptst(df['col1'])

Output:

python3.6/site-packages/unidip/dip.py:27: RuntimeWarning: divide by zero encountered in true_divide
  slopes = (work_cdf[1:] - work_cdf[0]) / distances
python3.6/site-packages/unidip/dip.py:30: RuntimeWarning: invalid value encountered in multiply
  gcm.extend(work_cdf[0] + distances[:minslope_idx] * minslope)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-23-27275520ec3c> in <module>
      1 import unidip.dip as dip
----> 2 dip.diptst(df['col1'])

python3.6/site-packages/unidip/dip.py in diptst(dat, is_hist, numt)
     45     """ diptest with pval """
     46     # sample dip
---> 47     d, (_, idxs, left, _, right, _) = dip_fn(dat, is_hist)
     48 
     49     # simulate from null uniform

python3.6/site-packages/unidip/dip.py in dip_fn(dat, is_hist, just_dip)
    106             d = d_right
    107         else:
--> 108             xl = left_touchpoints[d_left == left_diffs][0]
    109             xr = right_touchpoints[right_touchpoints >= xl][0]
    110             d = d_left

IndexError: index 0 is out of bounds for axis 0 with size 0

Thanks for the reply!

rraadd88 avatar Jun 27 '20 23:06 rraadd88

I had the same issue with the dip test. For me the problem was related to the precision of the idxs array. Within the _lcm_ method the subtraction idxs.max() - idxs[::-1] led to the creation of new duplicates.

This resulted in distances of 0 in the _gcm_ method and caused the divison by 0.

I fixed this by replacing counts = collections.Counter(X) idxs = np.msort(list(counts.keys())) histogram = np.array([counts[i] for i in idxs])

with X = np.around(X, 15) idxs, histogram = np.unique(X, return_counts=True).

Not the most elegant solution but it seems to work.

Btw.: This also fixed issue #2 because the "divide by zero" (only a warning) leads to an empty array which later causes an array out of bounds error.

collinleiber avatar Oct 31 '20 17:10 collinleiber