feature-selector icon indicating copy to clipboard operation
feature-selector copied to clipboard

Why identify_collinear does't consider statistical importance of Pearson coeff?

Open EugeniaKoKo opened this issue 4 years ago • 0 comments

In method identify_collinear I discovered, that you do not respect pvalue of Pearson coefficient. That is, one can remove features, which correlation have nor statistical importance \

It can be done simply by adding pvalue-check for each identified correlation:

from scipy import stats
pvalue = stats.pearsonr(data[feat1], data[feat2])[1]
if pvalue < 0.05 ... 

One can also add threshold for statistical significance and set 0.01 instead of 0.05

EugeniaKoKo avatar Mar 11 '20 15:03 EugeniaKoKo