polars
polars copied to clipboard
`corr` with ignoring null values
Description
With the corrent polars.DataFrame.corr, it returns nulls when columns contain nulls. Sometimes, ignoring nulls could be a wanted option. Pandas has pandas.DataFrame.corr.
Our (newly enforced) policy is that nulls should whenever possible be treated as completely absent by default. The problem with correlation is that you can have a scenario where only one of the two variables is missing.
I think it makes sense to change the default to calculate the correlation only using those rows where both columns have a value. Perhaps a statistician who has a stronger understanding of how the correlation coefficient is used could weigh in on that?
I agree with you. The current corr function is the same as check nulls and use your new corr function instead, which is simple to do. So a broader solution would always be welcomed.
Hi, agree with the above, this would be very nice to have the same implementation as pandas to handle nan values as well (https://github.com/pandas-dev/pandas/blob/d928a5cc222be5968b2f1f8a5f8d02977a8d6c2d/pandas/_libs/algos.pyx#L349 => nancorr).