feature-selector icon indicating copy to clipboard operation
feature-selector copied to clipboard

identify_collinear get wrong results when exsit features with 100% missing values

Open bison31205 opened this issue 6 years ago • 0 comments

There are a situation,if my data have a feature with 100% missing values, or threshold like 98% missing values, call identify_collinear() will get more features with a correlation magnitude greater than the correlation_threshold.

I cheaked the result of pd.DataFrame.corr(), there were high correlation between some features and the feature with 98% missing values. So when call identify_all(),we will remove more features. We should removed the features with greater than threshold mising values at first, and then identify collinear. May be there are some better strategys.

bison31205 avatar Dec 14 '18 10:12 bison31205