eli5 icon indicating copy to clipboard operation
eli5 copied to clipboard

How is the error calculated for Permutation Importance?

Open aghoshpub opened this issue 5 years ago • 16 comments

If we do: perm = PermutationImportance(D, random_state=1, n_iter=2, scoring=significance_scorer ).fit(X_test,y_test)

eli5.show_weights(perm, feature_names = data.columns.tolist())

Then we get some scores with +- errors. But these errors don't correspond to std/sqrt(n_iterations).

How are these errors computed?

aghoshpub avatar Jun 18 '19 15:06 aghoshpub

The errors are given by the score_func() of your estimator, but I am not sure if the values returned by eli5's get_score_importances() are really in the units of the score. I used 'negative RMSE' as score.

lkugler avatar Sep 04 '19 20:09 lkugler

I have a custom score function significance_scorer and the scores are in the right units but the errors make no sense to me. I made my own Permutation Importance implementation that's more transparent as a result and I don't use ELI5 anymore.

aghoshpub avatar Sep 05 '19 11:09 aghoshpub

@aghoshpub Why did the errors make no sense to you? I am seeing negative importances and I doubt that my model would give high weights to features that worsen the score...

lkugler avatar Sep 05 '19 11:09 lkugler

What is the formula used to calculate the errors? At least it doesn't correspond to std/sqrt(n_iterations). So what is the formula?

aghoshpub avatar Sep 05 '19 12:09 aghoshpub

Indeed, I just opened an issue having a similar question to yours: https://github.com/TeamHG-Memex/eli5/issues/365.

I have found a connection but not sure why is that. Also, still not sure if these displayed weights are actually the model performance reduction OR just the model's weights for the features/variables.

seralouk avatar Feb 28 '20 18:02 seralouk

I did some deep research. After going through the source code here is what I believe for the case where cv is used and is not prefit or None. I use a K-Folds scheme for my application. I also use a SVC model thus, score is the accuracy in this case.

By looking at the fit method of thePermutationImportance object, the _cv_scores_importances are computed (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L202). The specified cross-validation scheme is used and the base_scores, feature_importances are returned using the test data (function: _get_score_importances inside _cv_scores_importances).

By looking at get_score_importances function (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/permutation_importance.py#L55), we can see that base_score is the score on the non shuffled data and feature_importances (called differently there as: scores_decreases) are defined as non shuffled score - shuffled score (see https://github.com/TeamHG-Memex/eli5/blob/master/eli5/permutation_importance.py#L93)

Finally, the errors (feature_importances_std_) are the SD of the above feature_importances (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L209) and the feature_importances_ is the mean of the above feature_importances (non-shuffled score minus (-) shuffled score).

All the above answer your question but not mine: https://github.com/TeamHG-Memex/eli5/issues/365

seralouk avatar Feb 29 '20 13:02 seralouk

@seralouk Indeed. But the +- error is not the same as standard deviation. The error should decrease as you increase the number of iterations, n_iter, whereas the standard deviation should remain stable. I suppose one can suggest that we are more interested in the std in this case than error, but as you've pointed out, the function doesn't even do that.

There is also the problem ELI5 silently ignores sample_weights

aghoshpub avatar Mar 02 '20 18:03 aghoshpub

@seralouk Indeed. But the +- error is not the same as standard deviation. The error should decrease as you increase the number of iterations, n_iter, whereas the standard deviation should remain stable. I suppose one can suggest that we are more interested in the std in this case than error, but as you've pointed out, the function doesn't even do that.

There is also the problem ELI5 silently ignores sample_weights

If you have a look at https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L209 you can see that np.std(results, axis=0) is used. The results variable holds the permuted scores with dimensions [#iterations, #original_features]. Thus, the error is indeed the SD of the scores obtained on the permuted data.

Something important and strange: eli5.show_weights() displays wrongly the errors (SD) (see here: https://github.com/TeamHG-Memex/eli5/issues/365).

So np.std(results, axis=0) is stored corretly in perms.feature_importances_std_ but eli5.show_weights() displays 2 * perms.feature_importances_std_ for some unknown reason.

seralouk avatar Mar 02 '20 19:03 seralouk

As a follow-up, it seems that they coded it as 2*std to get the 95% CI estimation see comment here: https://github.com/TeamHG-Memex/eli5/issues/365#issuecomment-597056611). However, this is still not correct and it should be 2 * (std/sqrt(n_iter)).

To conclude, I intend to use eli5 and get the SD from perms.feature_importances_std_, but then I am going to report my results differently compared to eli5.show_weights(). I am going to use the expected 2 * (perms.feature_importances_std_) / (np.sqrt(len(perms.results_))) as error values.

seralouk avatar Mar 10 '20 13:03 seralouk

Very interesting, thanks for the for digging into it. I think our issues can be merged

aghoshpub avatar Mar 10 '20 14:03 aghoshpub

https://github.com/TeamHG-Memex/eli5/issues/365 is the same issue and has relevant discussion.

I don't think that dividing by sqrt(n_iterations) is needed anywhere (see https://github.com/TeamHG-Memex/eli5/issues/365#issuecomment-597242250 - although I could be wrong, explanations are welcome), but the way we present the confidence intervals could be confusing, so any suggestions how we can make it more clear are welcome, ideally with references.

lopuhin avatar Mar 10 '20 18:03 lopuhin

Thanks for the reply. Indeed, there is some confusion. Right now the show_weights() function prints mean +- 2* SD. If the intention is really to print a measure of CI then it should be mean +- 2 * (SD / sqrt(n)) == mean +- 2*SEM in order to be the 95th CI of the mean value.

Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2959222/

An alternative would be mean (SD) Note the parenthesis -- this tells us that it's the SD. But in any case, mean +- 2* SD does not make any sense since it's neither the SEM nor the SD.

My suggestion is either to make it mean (SD) (most people expect that) or mean +- 2 * (SD / sqrt(n)). In any case, this should be mentioned in the documentation.

In other words, a format like X(Y) means mean(SD) and X +- Y would be mean +-CI. The OP of this issue was expecting the second due to the +- symbols.

seralouk avatar Mar 10 '20 18:03 seralouk

Elsewhere I got feedback that mean ± std is a standard way to show mean and std, and that it's preferable to mean ± 2 * std or mean (std), so I think we should update show_weights and similar functions to show this, what do you think folks?

lopuhin avatar Mar 11 '20 08:03 lopuhin

Sure. mean ± std makes much more sense compared to mean ± 2*std Additionally, using the results_ attribute of the PermutationImportance object, one can calculate pretty much everything like SEM etc

seralouk avatar Mar 11 '20 08:03 seralouk

Thanks for getting back to us @lopuhin. That would already be better. But if you want to report the standard devision I would go for mean (std) as @seralouk suggests. In the scientific community +- has a meaning and mean +- std would be rather deceptive.

Elsewhere I got feedback that mean ± std is a standard way to show mean and std,

I would be curious to see the literature. Happy to be wrong, I just haven't ever seen it.

aghoshpub avatar Mar 11 '20 12:03 aghoshpub

I would be curious to see the literature. Happy to be wrong, I just haven't ever seen it.

This was an informal poll in a data science slack community. I'll try to double-check with some sources.

lopuhin avatar Mar 11 '20 13:03 lopuhin