healthcareai-py
healthcareai-py copied to clipboard
Expose feature importances as list via a method on a random forest TSM
short term urgent fix
- refactor out the list that is passed to the plotter
- have the plotter limit the list length so that the full list can easily be printed out to console with values
Something like
feature_names = trained_random_forest.column_names
aggregate_features_importances = trained_random_forest.model.best_estimator_.feature_importances_
indices = np.argsort(aggregate_features_importances)[::-1]
sorted_feature_names = [feature_names[i] for i in indices]
df = pd.DataFrame({'feature': sorted_feature_names, 'relative_importance': aggregate_features_importances})
# df.to_csv('feature_importances.csv')
df
or something like this snipped I've used in ad-hoc jupyter notebooks
def importance_list(df, predicted_column, rf):
columns = [x for x in df.columns if x not in [predicted_column]]
values = rf.model.best_estimator_.feature_importances_
print(len(columns), len(values))
importances = pd.DataFrame({
'feature': columns,
'importance': values})
importances.sort_values('importance', ascending=False, inplace=True)
importances['importance'] = 100 * importances['importance']
importances.reset_index(drop=True, inplace=True)
return importances