healthcareai-py icon indicating copy to clipboard operation
healthcareai-py copied to clipboard

Expose feature importances as list via a method on a random forest TSM

Open Aylr opened this issue 7 years ago • 0 comments

short term urgent fix

  • refactor out the list that is passed to the plotter
  • have the plotter limit the list length so that the full list can easily be printed out to console with values

Something like

    feature_names = trained_random_forest.column_names

    aggregate_features_importances = trained_random_forest.model.best_estimator_.feature_importances_
    indices = np.argsort(aggregate_features_importances)[::-1]
    sorted_feature_names = [feature_names[i] for i in indices]
    df = pd.DataFrame({'feature': sorted_feature_names, 'relative_importance': aggregate_features_importances})
    # df.to_csv('feature_importances.csv')
    df

or something like this snipped I've used in ad-hoc jupyter notebooks

def importance_list(df, predicted_column, rf):
    columns = [x for x in df.columns if x not in [predicted_column]]
    values = rf.model.best_estimator_.feature_importances_
    print(len(columns), len(values))
    
    importances = pd.DataFrame({
        'feature': columns,
        'importance': values})
    importances.sort_values('importance', ascending=False, inplace=True)
    importances['importance'] = 100 * importances['importance']
    importances.reset_index(drop=True, inplace=True)
    
    return importances

Aylr avatar Sep 07 '17 16:09 Aylr