eli5
eli5 copied to clipboard
How is the error calculated for Permutation Importance?
If we do:
perm = PermutationImportance(D, random_state=1, n_iter=2, scoring=significance_scorer ).fit(X_test,y_test)
eli5.show_weights(perm, feature_names = data.columns.tolist())
Then we get some scores with +-
errors. But these errors don't correspond to std/sqrt(n_iterations)
.
How are these errors computed?
The errors are given by the score_func()
of your estimator, but I am not sure if the values returned by eli5's get_score_importances()
are really in the units of the score. I used 'negative RMSE' as score.
I have a custom score function significance_scorer
and the scores are in the right units but the errors make no sense to me. I made my own Permutation Importance implementation that's more transparent as a result and I don't use ELI5 anymore.
@aghoshpub Why did the errors make no sense to you? I am seeing negative importances and I doubt that my model would give high weights to features that worsen the score...
What is the formula used to calculate the errors?
At least it doesn't correspond to std/sqrt(n_iterations)
. So what is the formula?
Indeed, I just opened an issue having a similar question to yours: https://github.com/TeamHG-Memex/eli5/issues/365.
I have found a connection but not sure why is that. Also, still not sure if these displayed weights
are actually the model performance reduction OR just the model's weights for the features/variables.
I did some deep research.
After going through the source code here is what I believe for the case where cv
is used and is not prefit
or None
. I use a K-Folds scheme for my application. I also use a SVC model thus, score
is the accuracy in this case.
By looking at the fit
method of thePermutationImportance
object, the _cv_scores_importances
are computed (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L202). The specified cross-validation scheme is used and the base_scores, feature_importances
are returned using the test data (function: _get_score_importances
inside _cv_scores_importances
).
By looking at get_score_importances
function (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/permutation_importance.py#L55), we can see that base_score
is the score on the non shuffled data and feature_importances
(called differently there as: scores_decreases
) are defined as non shuffled score - shuffled score (see https://github.com/TeamHG-Memex/eli5/blob/master/eli5/permutation_importance.py#L93)
Finally, the errors (feature_importances_std_
) are the SD of the above feature_importances
(https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L209) and the feature_importances_
is the mean of the above feature_importances
(non-shuffled score minus (-) shuffled score).
All the above answer your question but not mine: https://github.com/TeamHG-Memex/eli5/issues/365
@seralouk Indeed. But the +- error is not the same as standard deviation. The error should decrease as you increase the number of iterations, n_iter
, whereas the standard deviation should remain stable. I suppose one can suggest that we are more interested in the std in this case than error, but as you've pointed out, the function doesn't even do that.
There is also the problem ELI5 silently ignores sample_weights
@seralouk Indeed. But the +- error is not the same as standard deviation. The error should decrease as you increase the number of iterations,
n_iter
, whereas the standard deviation should remain stable. I suppose one can suggest that we are more interested in the std in this case than error, but as you've pointed out, the function doesn't even do that.There is also the problem ELI5 silently ignores
sample_weights
If you have a look at https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L209 you can see that np.std(results, axis=0)
is used. The results
variable holds the permuted scores with dimensions [#iterations, #original_features]. Thus, the error is indeed the SD of the scores obtained on the permuted data.
Something important and strange: eli5.show_weights()
displays wrongly the errors (SD) (see here: https://github.com/TeamHG-Memex/eli5/issues/365).
So np.std(results, axis=0)
is stored corretly in perms.feature_importances_std_
but eli5.show_weights()
displays 2 * perms.feature_importances_std_
for some unknown reason.
As a follow-up, it seems that they coded it as 2*std
to get the 95% CI estimation see comment here: https://github.com/TeamHG-Memex/eli5/issues/365#issuecomment-597056611). However, this is still not correct and it should be 2 * (std/sqrt(n_iter))
.
To conclude, I intend to use eli5
and get the SD
from perms.feature_importances_std_
, but then I am going to report my results differently compared to eli5.show_weights()
. I am going to use the expected 2 * (perms.feature_importances_std_) / (np.sqrt(len(perms.results_)))
as error values.
Very interesting, thanks for the for digging into it. I think our issues can be merged
https://github.com/TeamHG-Memex/eli5/issues/365 is the same issue and has relevant discussion.
I don't think that dividing by sqrt(n_iterations)
is needed anywhere (see https://github.com/TeamHG-Memex/eli5/issues/365#issuecomment-597242250 - although I could be wrong, explanations are welcome), but the way we present the confidence intervals could be confusing, so any suggestions how we can make it more clear are welcome, ideally with references.
Thanks for the reply. Indeed, there is some confusion. Right now the show_weights()
function prints mean +- 2* SD
. If the intention is really to print a measure of CI then it should be mean +- 2 * (SD / sqrt(n))
== mean +- 2*SEM
in order to be the 95th CI of the mean value.
Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2959222/
An alternative would be mean (SD)
Note the parenthesis -- this tells us that it's the SD. But in any case, mean +- 2* SD
does not make any sense since it's neither the SEM nor the SD.
My suggestion is either to make it mean (SD)
(most people expect that) or mean +- 2 * (SD / sqrt(n))
. In any case, this should be mentioned in the documentation.
In other words, a format like X(Y)
means mean(SD)
and X +- Y
would be mean +-CI
.
The OP of this issue was expecting the second due to the +-
symbols.
Elsewhere I got feedback that mean ± std
is a standard way to show mean and std, and that it's preferable to mean ± 2 * std
or mean (std)
, so I think we should update show_weights
and similar functions to show this, what do you think folks?
Sure. mean ± std
makes much more sense compared to mean ± 2*std
Additionally, using the results_
attribute of the PermutationImportance
object, one can calculate pretty much everything like SEM etc
Thanks for getting back to us @lopuhin. That would already be better. But if you want to report the standard devision I would go for mean (std)
as @seralouk suggests. In the scientific community +-
has a meaning and mean +- std
would be rather deceptive.
Elsewhere I got feedback that mean ± std is a standard way to show mean and std,
I would be curious to see the literature. Happy to be wrong, I just haven't ever seen it.
I would be curious to see the literature. Happy to be wrong, I just haven't ever seen it.
This was an informal poll in a data science slack community. I'll try to double-check with some sources.