MLJ.jl icon indicating copy to clipboard operation
MLJ.jl copied to clipboard

Improved feature importance support

Open ablaom opened this issue 3 years ago • 5 comments

The MLJ model API only says that model reporting feature importances should report them in the report output by fit. But it says nothing about the actual format of this output, and I can see inconsistencies in the implementations. Feature importances are used by some meta-alogorithms, such as RecursiveFeatureElimination (#426) so this might be worth sorting out.

I propose adding a new method feature_importance(model::Model, report) to the model API to report the scores, according to some fixed convention. ~~Some models (e.g., LightGBM models) report multiple types of importance scores. So I propose this method return a named tuple keyed on the type, whose values are Float64 vectors.~~

edit See suggestion for format below.

edit The proposal follows that same interface patter that we have already for training_losses.

Thoughts anyone?

TODO:

  • [x] Add reports_feature_importances trait to StatisticalTraits, defaulting to false
  • [x] Add feature_importances(model, report) stub to MLJModelInterface (in model_api.jl); fallback to return nothing.
  • [x] In MLJBase: Overload MMI.feature_importance(mach::Machine) following this pattern
  • [x] Update MLJ model API docs
  • [x] In MLJ: https://github.com/alan-turing-institute/MLJ.jl/issues/954
  • [ ] Roll out implementations for packages that already report importances in their report (including linear models for which absolute value of coefficients serve). These include:
    • [x] MLJDecisionTreeInterface
    • [x] EvoTrees models
    • [ ] MLJLinearModels
    • [ ] MLJGLMInterface
    • [x] MLJXGBoostInterface
    • [ ] LightGBM It may make sense to roll out data front-ends for some of these models at the same time, mimicking the EvoTrees case where this already done.
  • [ ] Get a list of scikit learn models that expose importances or coefficients and get these models to report the scores in their report, and to implement the above method and trait. See https://github.com/JuliaAI/MLJScikitLearnInterface.jl/issues/30 and https://github.com/JuliaAI/MLJScikitLearnInterface.jl/issues/26

ablaom avatar Feb 28 '21 23:02 ablaom

cc @boliu-christine

ablaom avatar Mar 01 '21 00:03 ablaom

Here's an update on my suggestion for the format of feature importances, as returned by the proposed method feature_importances(model, report).

I think allowing models to expose multiple types of feature importance is overkill / excessively complicated. Of course multiple scores can still be declared in the report itself.

So I suggest a vector of name => float pairs, where name is a symbol:

v= [:gender =>0.23, :height =>, :weight => 0.1] 

ablaom avatar Dec 21 '21 20:12 ablaom

What is the current state of this ?? I need feature importance support !

zsz00 avatar Jan 29 '22 05:01 zsz00

What is the current state of this ?? I need feature importance support !

Am still working on this. Will be done soon.

OkonSamuel avatar Jan 29 '22 10:01 OkonSamuel

What is the current state of this ? @OkonSamuel

zsz00 avatar Mar 06 '22 14:03 zsz00