feature-selector
feature-selector copied to clipboard
Why identify_collinear does't consider statistical importance of Pearson coeff?
In method identify_collinear I discovered, that you do not respect pvalue of Pearson coefficient. That is, one can remove features, which correlation have nor statistical importance \
It can be done simply by adding pvalue-check for each identified correlation:
from scipy import stats
pvalue = stats.pearsonr(data[feat1], data[feat2])[1]
if pvalue < 0.05 ...
One can also add threshold for statistical significance and set 0.01 instead of 0.05