Expose feature importances as list via a method on a random forest TSM

Open Aylr opened this issue 7 years ago • 0 comments

short term urgent fix

refactor out the list that is passed to the plotter
have the plotter limit the list length so that the full list can easily be printed out to console with values

Something like

    feature_names = trained_random_forest.column_names

    aggregate_features_importances = trained_random_forest.model.best_estimator_.feature_importances_
    indices = np.argsort(aggregate_features_importances)[::-1]
    sorted_feature_names = [feature_names[i] for i in indices]
    df = pd.DataFrame({'feature': sorted_feature_names, 'relative_importance': aggregate_features_importances})
    # df.to_csv('feature_importances.csv')
    df

or something like this snipped I've used in ad-hoc jupyter notebooks

def importance_list(df, predicted_column, rf):
    columns = [x for x in df.columns if x not in [predicted_column]]
    values = rf.model.best_estimator_.feature_importances_
    print(len(columns), len(values))
    
    importances = pd.DataFrame({
        'feature': columns,
        'importance': values})
    importances.sort_values('importance', ascending=False, inplace=True)
    importances['importance'] = 100 * importances['importance']
    importances.reset_index(drop=True, inplace=True)
    
    return importances

Sep 07 '17 16:09 Aylr

healthcareai-py healthcareai-py copied to clipboard

Expose feature importances as list via a method on a random forest TSM

short term urgent fix

healthcareai-py
healthcareai-py copied to clipboard