scikit-posthocs icon indicating copy to clipboard operation
scikit-posthocs copied to clipboard

Solving ValueError; 'All numbers are identical in mannwhitneyu'

Open fMizki opened this issue 3 years ago • 2 comments

Hi,

I often use your _posthoc_mannwhitneyu and I get ValueError 'All numbers are identical in mannwhitneyu' when two groups are composed from idential numbers. But I thought we should adjust p-values including the p-value(=1.0) from those comparisons, so I modified the code in _pothoc.py like this.

def _posthoc_mannwhitney(
        a: Union[list, np.ndarray, DataFrame],
        val_col: str = None,
        group_col: str = None,
        use_continuity: bool = True,
        alternative: str = 'two-sided',
        p_adjust: str = None,
        sort: bool = True) -> DataFrame:
    '''Pairwise comparisons with Mann-Whitney rank test.

    Parameters
    ----------
    a : array_like or pandas DataFrame object
        An array, any object exposing the array interface or a pandas
        DataFrame. Array must be two-dimensional.

    val_col : str, optional
        Name of a DataFrame column that contains dependent variable values (test
        or response variable). Values should have a non-nominal scale. Must be
        specified if `a` is a pandas DataFrame object.

    group_col : str, optional
        Name of a DataFrame column that contains independent variable values
        (grouping or predictor variable). Values should have a nominal scale
        (categorical). Must be specified if `a` is a pandas DataFrame object.

    use_continuity : bool, optional
        Whether a continuity correction (1/2.) should be taken into account.
        Default is True.

    alternative : ['two-sided', 'less', or 'greater'], optional
        Whether to get the p-value for the one-sided hypothesis
        ('less' or 'greater') or for the two-sided hypothesis ('two-sided').
        Defaults to 'two-sided'.

    p_adjust : str, optional
        Method for adjusting p values.
        See statsmodels.sandbox.stats.multicomp for details.
        Available methods are:
        'bonferroni' : one-step correction
        'sidak' : one-step correction
        'holm-sidak' : step-down method using Sidak adjustments
        'holm' : step-down method using Bonferroni adjustments
        'simes-hochberg' : step-up method  (independent)
        'hommel' : closed method based on Simes tests (non-negative)
        'fdr_bh' : Benjamini/Hochberg  (non-negative)
        'fdr_by' : Benjamini/Yekutieli (negative)
        'fdr_tsbh' : two stage fdr correction (non-negative)
        'fdr_tsbky' : two stage fdr correction (non-negative)

    sort : bool, optional
        Specifies whether to sort DataFrame by group_col or not. Recommended
        unless you sort your data manually.

    Returns
    -------
    result : pandas.DataFrame
        P values.

    Notes
    -----
    Refer to `scipy.stats.mannwhitneyu` reference page for further details.

    Examples
    --------
    >>> x = [[1,2,3,4,5], [35,31,75,40,21], [10,6,9,6,1]]
    >>> sp.posthoc_mannwhitney(x, p_adjust = 'holm')
    '''
    x, _val_col, _group_col = __convert_to_df(a, val_col, group_col)
    x = x.sort_values(by=[_group_col, _val_col], ascending=True) if sort else x

    groups = x[_group_col].unique()
    x_len = groups.size
    vs = np.zeros((x_len, x_len))
    xg = x.groupby(_group_col)[_val_col]
    tri_upper = np.triu_indices(vs.shape[0], 1)
    tri_lower = np.tril_indices(vs.shape[0], -1)
    vs[:, :] = 0

    combs = it.combinations(range(x_len), 2)

    for i, j in combs: ##I modified this section##
        try:
            vs[i, j] = ss.mannwhitneyu(
                xg.get_group(groups[i]),
                xg.get_group(groups[j]),
                use_continuity=use_continuity,
                alternative=alternative)[1]
        except ValueError as e:
            if str(e)=="All numbers are identical in mannwhitneyu":
                vs[i, j] =1.0
            else:
                raise e

    if p_adjust:
        vs[tri_upper] = multipletests(vs[tri_upper], method=p_adjust)[1]

    vs[tri_lower] = np.transpose(vs)[tri_lower]
    np.fill_diagonal(vs, 1)
    return DataFrame(vs, index=groups, columns=groups)

Is this a right solution?

I'm not sure but this error may not occur with other versions of scipy.stats.

fMizki avatar May 22 '22 02:05 fMizki

Hello! Thank you for reporting this. I cannot find this type of error in the latest codebase of scipy. What version of scipy are you using?

maximtrp avatar Jul 08 '22 11:07 maximtrp

Hello! Thank you for your reply

import scipy scipy.__version__ Out: '1.6.2'

My scipy's veison is 1.6.2.

Thank you.

fMizki avatar Jul 11 '22 00:07 fMizki

I have rechecked this with the latest version of scipy. Now it is not throwing such an error. No fix is needed.

maximtrp avatar May 08 '23 08:05 maximtrp