eli5 How is the error calculated for Permutation Importance?

If we do: perm = PermutationImportance(D, random_state=1, n_iter=2, scoring=significance_scorer ).fit(X_test,y_test)

eli5.show_weights(perm, feature_names = data.columns.tolist())

Then we get some scores with +- errors. But these errors don't correspond to std/sqrt(n_iterations).

How are these errors computed?

Jun 18 '19 15:06 aghoshpub

The errors are given by the score_func() of your estimator, but I am not sure if the values returned by eli5's get_score_importances() are really in the units of the score. I used 'negative RMSE' as score.

Sep 04 '19 20:09 lkugler

I have a custom score function significance_scorer and the scores are in the right units but the errors make no sense to me. I made my own Permutation Importance implementation that's more transparent as a result and I don't use ELI5 anymore.

Sep 05 '19 11:09 aghoshpub

@aghoshpub Why did the errors make no sense to you? I am seeing negative importances and I doubt that my model would give high weights to features that worsen the score...

Sep 05 '19 11:09 lkugler

What is the formula used to calculate the errors? At least it doesn't correspond to std/sqrt(n_iterations). So what is the formula?

Sep 05 '19 12:09 aghoshpub

Indeed, I just opened an issue having a similar question to yours: https://github.com/TeamHG-Memex/eli5/issues/365.

I have found a connection but not sure why is that. Also, still not sure if these displayed weights are actually the model performance reduction OR just the model's weights for the features/variables.

Feb 28 '20 18:02 seralouk

I did some deep research. After going through the source code here is what I believe for the case where cv is used and is not prefit or None. I use a K-Folds scheme for my application. I also use a SVC model thus, score is the accuracy in this case.

By looking at the fit method of thePermutationImportance object, the _cv_scores_importances are computed (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L202). The specified cross-validation scheme is used and the base_scores, feature_importances are returned using the test data (function: _get_score_importances inside _cv_scores_importances).

By looking at get_score_importances function (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/permutation_importance.py#L55), we can see that base_score is the score on the non shuffled data and feature_importances (called differently there as: scores_decreases) are defined as non shuffled score - shuffled score (see https://github.com/TeamHG-Memex/eli5/blob/master/eli5/permutation_importance.py#L93)

Finally, the errors (feature_importances_std_) are the SD of the above feature_importances (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L209) and the feature_importances_ is the mean of the above feature_importances (non-shuffled score minus (-) shuffled score).

All the above answer your question but not mine: https://github.com/TeamHG-Memex/eli5/issues/365

Feb 29 '20 13:02 seralouk

@seralouk Indeed. But the +- error is not the same as standard deviation. The error should decrease as you increase the number of iterations, n_iter, whereas the standard deviation should remain stable. I suppose one can suggest that we are more interested in the std in this case than error, but as you've pointed out, the function doesn't even do that.

There is also the problem ELI5 silently ignores sample_weights

Mar 02 '20 18:03 aghoshpub

@seralouk Indeed. But the +- error is not the same as standard deviation. The error should decrease as you increase the number of iterations, n_iter, whereas the standard deviation should remain stable. I suppose one can suggest that we are more interested in the std in this case than error, but as you've pointed out, the function doesn't even do that.

There is also the problem ELI5 silently ignores sample_weights

If you have a look at https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L209 you can see that np.std(results, axis=0) is used. The results variable holds the permuted scores with dimensions [#iterations, #original_features]. Thus, the error is indeed the SD of the scores obtained on the permuted data.

Something important and strange: eli5.show_weights() displays wrongly the errors (SD) (see here: https://github.com/TeamHG-Memex/eli5/issues/365).

So np.std(results, axis=0) is stored corretly in perms.feature_importances_std_ but eli5.show_weights() displays 2 * perms.feature_importances_std_ for some unknown reason.

Mar 02 '20 19:03 seralouk

As a follow-up, it seems that they coded it as 2*std to get the 95% CI estimation see comment here: https://github.com/TeamHG-Memex/eli5/issues/365#issuecomment-597056611). However, this is still not correct and it should be 2 * (std/sqrt(n_iter)).

To conclude, I intend to use eli5 and get the SD from perms.feature_importances_std_, but then I am going to report my results differently compared to eli5.show_weights(). I am going to use the expected 2 * (perms.feature_importances_std_) / (np.sqrt(len(perms.results_))) as error values.

Mar 10 '20 13:03 seralouk

Very interesting, thanks for the for digging into it. I think our issues can be merged

Mar 10 '20 14:03 aghoshpub

https://github.com/TeamHG-Memex/eli5/issues/365 is the same issue and has relevant discussion.

I don't think that dividing by sqrt(n_iterations) is needed anywhere (see https://github.com/TeamHG-Memex/eli5/issues/365#issuecomment-597242250 - although I could be wrong, explanations are welcome), but the way we present the confidence intervals could be confusing, so any suggestions how we can make it more clear are welcome, ideally with references.

Mar 10 '20 18:03 lopuhin

Thanks for the reply. Indeed, there is some confusion. Right now the show_weights() function prints mean +- 2* SD. If the intention is really to print a measure of CI then it should be mean +- 2 * (SD / sqrt(n)) == mean +- 2*SEM in order to be the 95th CI of the mean value.

Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2959222/

An alternative would be mean (SD) Note the parenthesis -- this tells us that it's the SD. But in any case, mean +- 2* SD does not make any sense since it's neither the SEM nor the SD.

My suggestion is either to make it mean (SD) (most people expect that) or mean +- 2 * (SD / sqrt(n)). In any case, this should be mentioned in the documentation.

In other words, a format like X(Y) means mean(SD) and X +- Y would be mean +-CI. The OP of this issue was expecting the second due to the +- symbols.

Mar 10 '20 18:03 seralouk

Elsewhere I got feedback that mean ± std is a standard way to show mean and std, and that it's preferable to mean ± 2 * std or mean (std), so I think we should update show_weights and similar functions to show this, what do you think folks?

Mar 11 '20 08:03 lopuhin

Sure. mean ± std makes much more sense compared to mean ± 2*std Additionally, using the results_ attribute of the PermutationImportance object, one can calculate pretty much everything like SEM etc

Mar 11 '20 08:03 seralouk

Thanks for getting back to us @lopuhin. That would already be better. But if you want to report the standard devision I would go for mean (std) as @seralouk suggests. In the scientific community +- has a meaning and mean +- std would be rather deceptive.

Elsewhere I got feedback that mean ± std is a standard way to show mean and std,

I would be curious to see the literature. Happy to be wrong, I just haven't ever seen it.

Mar 11 '20 12:03 aghoshpub

I would be curious to see the literature. Happy to be wrong, I just haven't ever seen it.

This was an informal poll in a data science slack community. I'll try to double-check with some sources.

Mar 11 '20 13:03 lopuhin

eli5 eli5 copied to clipboard

How is the error calculated for Permutation Importance?

eli5
eli5 copied to clipboard