pingouin icon indicating copy to clipboard operation
pingouin copied to clipboard

partial_corr documentation / assertion wrong

Open mkeds opened this issue 2 years ago • 4 comments

The following minimum code example straight from documentation

https://pingouin-stats.org/build/html/generated/pingouin.partial_corr.html#pingouin.partial_corr Example 1

import pingouin as pg df = pg.read_dataset('partial_corr') pg.partial_corr(data=df, x='x', y='y', covar='cv1').round(3) n r CI95% p-val pearson 30 0.568 [0.25, 0.77] 0.001

returns: AssertionError: columns are not in dataframe.

on line 843 from correlation.py: assert all([c in data for c in col]), "columns are not in dataframe."

which kind of makes sense, because just before in line 842 we have: col = _flatten_list([x, y, covar, x_covar, y_covar])

and since we don't supply x_covar y_covar when calling the function in the example, they default to None in function header, and None in not in the DataFrame by default:

In[664]: None in df Out[664]: False

Is this expected behaviour or something wrong with my Pandas? I'm using pandas=2.1.2,

mkeds avatar Nov 08 '23 13:11 mkeds

partial_corr() assertion checks seem wobbly in general, e.g.:

        assert x != covar, "x and covar must be independent"
        assert y != covar, "y and covar must be independent"

These come where we yet not know whether covar is a list or a string. If it's a list, truth value is ambiguous.

I'd suggest following small changes: In 835:

    assert x != y, "x and y must be independent"
    if isinstance(covar, list):
        assert x not in covar, "x and covar must be independent"
        assert y not in covar, "y and covar must be independent"
    else:
        assert x != covar, "x and covar must be independent"
        assert y != covar, "y and covar must be independent"

And then in 842:

    # Check that columns exist
    col = _flatten_list([x, y, covar, x_covar, y_covar])
    col = [i for i in col if i is not None] # <-- add this
    assert all([c in data for c in col]), "columns are not in dataframe."

without trimming the col list from None values, the assertion is triggered each time we don't have x_covar and y_covar - which we should not have at the same time, anyway.

Hope this helps somebody.

mkeds avatar Nov 09 '23 14:11 mkeds

Also, note that in partial_corr() you drop NA's from your data, while in pcorr() you don't, which at best throws an error / some NA's in the result, and at worst is leading to different results in a situation which should lead to the same results.

mkeds avatar Nov 09 '23 14:11 mkeds

Hi @mkeds

  1. The AssertionError occurs in Python 3.12 and has already been fixed in https://github.com/raphaelvallat/pingouin/pull/370, however I did not have the time to release a new version of Pingouin yet.
  2. The proposed update to the assertion check is a good idea. I'll implement it now.
  3. Feel free to open a PR to drop nan values in pingouin.pcorr as well, ideally with a minimal example showing that the output is different between the two functions.

Thanks, Raphael

raphaelvallat avatar Nov 10 '23 20:11 raphaelvallat