scikit-posthocs
scikit-posthocs copied to clipboard
Solving ValueError; 'All numbers are identical in mannwhitneyu'
Hi,
I often use your _posthoc_mannwhitneyu and I get ValueError 'All numbers are identical in mannwhitneyu' when two groups are composed from idential numbers. But I thought we should adjust p-values including the p-value(=1.0) from those comparisons, so I modified the code in _pothoc.py like this.
def _posthoc_mannwhitney(
a: Union[list, np.ndarray, DataFrame],
val_col: str = None,
group_col: str = None,
use_continuity: bool = True,
alternative: str = 'two-sided',
p_adjust: str = None,
sort: bool = True) -> DataFrame:
'''Pairwise comparisons with Mann-Whitney rank test.
Parameters
----------
a : array_like or pandas DataFrame object
An array, any object exposing the array interface or a pandas
DataFrame. Array must be two-dimensional.
val_col : str, optional
Name of a DataFrame column that contains dependent variable values (test
or response variable). Values should have a non-nominal scale. Must be
specified if `a` is a pandas DataFrame object.
group_col : str, optional
Name of a DataFrame column that contains independent variable values
(grouping or predictor variable). Values should have a nominal scale
(categorical). Must be specified if `a` is a pandas DataFrame object.
use_continuity : bool, optional
Whether a continuity correction (1/2.) should be taken into account.
Default is True.
alternative : ['two-sided', 'less', or 'greater'], optional
Whether to get the p-value for the one-sided hypothesis
('less' or 'greater') or for the two-sided hypothesis ('two-sided').
Defaults to 'two-sided'.
p_adjust : str, optional
Method for adjusting p values.
See statsmodels.sandbox.stats.multicomp for details.
Available methods are:
'bonferroni' : one-step correction
'sidak' : one-step correction
'holm-sidak' : step-down method using Sidak adjustments
'holm' : step-down method using Bonferroni adjustments
'simes-hochberg' : step-up method (independent)
'hommel' : closed method based on Simes tests (non-negative)
'fdr_bh' : Benjamini/Hochberg (non-negative)
'fdr_by' : Benjamini/Yekutieli (negative)
'fdr_tsbh' : two stage fdr correction (non-negative)
'fdr_tsbky' : two stage fdr correction (non-negative)
sort : bool, optional
Specifies whether to sort DataFrame by group_col or not. Recommended
unless you sort your data manually.
Returns
-------
result : pandas.DataFrame
P values.
Notes
-----
Refer to `scipy.stats.mannwhitneyu` reference page for further details.
Examples
--------
>>> x = [[1,2,3,4,5], [35,31,75,40,21], [10,6,9,6,1]]
>>> sp.posthoc_mannwhitney(x, p_adjust = 'holm')
'''
x, _val_col, _group_col = __convert_to_df(a, val_col, group_col)
x = x.sort_values(by=[_group_col, _val_col], ascending=True) if sort else x
groups = x[_group_col].unique()
x_len = groups.size
vs = np.zeros((x_len, x_len))
xg = x.groupby(_group_col)[_val_col]
tri_upper = np.triu_indices(vs.shape[0], 1)
tri_lower = np.tril_indices(vs.shape[0], -1)
vs[:, :] = 0
combs = it.combinations(range(x_len), 2)
for i, j in combs: ##I modified this section##
try:
vs[i, j] = ss.mannwhitneyu(
xg.get_group(groups[i]),
xg.get_group(groups[j]),
use_continuity=use_continuity,
alternative=alternative)[1]
except ValueError as e:
if str(e)=="All numbers are identical in mannwhitneyu":
vs[i, j] =1.0
else:
raise e
if p_adjust:
vs[tri_upper] = multipletests(vs[tri_upper], method=p_adjust)[1]
vs[tri_lower] = np.transpose(vs)[tri_lower]
np.fill_diagonal(vs, 1)
return DataFrame(vs, index=groups, columns=groups)
Is this a right solution?
I'm not sure but this error may not occur with other versions of scipy.stats.
Hello! Thank you for reporting this. I cannot find this type of error in the latest codebase of scipy. What version of scipy are you using?
Hello! Thank you for your reply
import scipy scipy.__version__ Out: '1.6.2'
My scipy's veison is 1.6.2.
Thank you.
I have rechecked this with the latest version of scipy. Now it is not throwing such an error. No fix is needed.