polars icon indicating copy to clipboard operation
polars copied to clipboard

is there an equivalent for the panda's dataframe.cor?

Open Arend-Jan opened this issue 2 years ago • 1 comments

Research

  • [X] I have searched the above polars tags on Stack Overflow for similar questions.

  • [ ] I have asked my usage related question on Stack Overflow.

Link to question on Stack Overflow

No response

Question about Polars

Pandas has a function for calculating correlation over en entire dataframe called corr(). Does polars have en equivalent for this?

Arend-Jan avatar Jan 03 '23 10:01 Arend-Jan

Yes - https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.pearson_corr.html#polars-pearson-corr

StijnKas avatar Jan 03 '23 11:01 StijnKas

And on the dataframe as well: https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.pearson_corr.html#polars.DataFrame.pearson_corr

ritchie46 avatar Jan 04 '23 09:01 ritchie46

@ritchie46 Hi, I think here exits a bug, Now Pearson_corr only support matrix that row number equals col number, otherwise it will raise Exception like:

df = pl.DataFrame({"foo": [1, 2, 3, 4], "bar": [3, 2, 1, 4 ], "ham": [7, 8, 9, 4]})
df.pearson_corr()
  File "1.py", line 5, in <module>
    df.pearson_corr()
  File "/home/appadmin/anaconda3/lib/python3.8/site-packages/polars/internals/dataframe/frame.py", line 7362, in pearson_corr
    np.corrcoef(self, **kwargs),
  File "<__array_function__ internals>", line 5, in corrcoef
  File "/home/appadmin/anaconda3/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2529, in corrcoef
    c = cov(x, y, rowvar)
  File "<__array_function__ internals>", line 5, in cov
  File "/home/appadmin/anaconda3/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2369, in cov
    m = np.asarray(m)
  File "/home/appadmin/anaconda3/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: cannot copy sequence with size 4 to array axis with dimension 3

The reason is that polars DataFrame convert to Numpy failed! And My Numpy version is 1.18.5 and Polars Version is 0.16.5.

df = pl.DataFrame({"foo": [1, 2, 3, 4], "bar": [3, 2, 1, 4], "ham": [7, 8, 9, 4]})
np.asarray(df)# failed!

drivenow avatar Feb 27 '23 07:02 drivenow