unidip
unidip copied to clipboard
RuntimeWarning: divide by zero encountered in true_divide & invalid value encountered in multiply
Hi @BenjaminDoran ,
I find unidip
very helpful in identifying unimodal distributions. Thanks for the development of this package!
Regarding the warning I get, this happens when I run dip.diptst
/python3.6/site-packages/unidip/dip.py:27: RuntimeWarning: divide by zero encountered in true_divide
slopes = (work_cdf[1:] - work_cdf[0]) / distances
/python3.6/site-packages/unidip/dip.py:30: RuntimeWarning: invalid value encountered in multiply
gcm.extend(work_cdf[0] + distances[:minslope_idx] * minslope)
After the warning, the process stalls for a very long time. Looking at the values, I couldn't pinpoint the reason why this is happening. Any suggestions? I wonder how I could avoid this warning.
Update on avoiding the warning. The warnings can be ignored as below.
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)
This is helpful when coding a (unbreakable) for loop.
However, I am still not sure the reason why such warnings are given out in the first place.
@rraadd88 please could you post an example of an input that generates this error?
Hi @tompollard , Here's a table with example input that produced the warnings: dip_test.txt
Demo code:
import pandas as pd
df=pd.read_table('dip_test.txt')
import unidip.dip as dip
dip.diptst(df['col1'])
Output:
python3.6/site-packages/unidip/dip.py:27: RuntimeWarning: divide by zero encountered in true_divide
slopes = (work_cdf[1:] - work_cdf[0]) / distances
python3.6/site-packages/unidip/dip.py:30: RuntimeWarning: invalid value encountered in multiply
gcm.extend(work_cdf[0] + distances[:minslope_idx] * minslope)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-23-27275520ec3c> in <module>
1 import unidip.dip as dip
----> 2 dip.diptst(df['col1'])
python3.6/site-packages/unidip/dip.py in diptst(dat, is_hist, numt)
45 """ diptest with pval """
46 # sample dip
---> 47 d, (_, idxs, left, _, right, _) = dip_fn(dat, is_hist)
48
49 # simulate from null uniform
python3.6/site-packages/unidip/dip.py in dip_fn(dat, is_hist, just_dip)
106 d = d_right
107 else:
--> 108 xl = left_touchpoints[d_left == left_diffs][0]
109 xr = right_touchpoints[right_touchpoints >= xl][0]
110 d = d_left
IndexError: index 0 is out of bounds for axis 0 with size 0
Thanks for the reply!
I had the same issue with the dip test.
For me the problem was related to the precision of the idxs array.
Within the _lcm_ method the subtraction
idxs.max() - idxs[::-1]
led to the creation of new duplicates.
This resulted in distances of 0 in the _gcm_ method and caused the divison by 0.
I fixed this by replacing
counts = collections.Counter(X)
idxs = np.msort(list(counts.keys()))
histogram = np.array([counts[i] for i in idxs])
with
X = np.around(X, 15)
idxs, histogram = np.unique(X, return_counts=True)
.
Not the most elegant solution but it seems to work.
Btw.: This also fixed issue #2 because the "divide by zero" (only a warning) leads to an empty array which later causes an array out of bounds error.