alibi Importance Measure for Counterfactuals with Reinforcement Learning

Importance Measure for Counterfactuals with Reinforcement Learning

Open HeyItsBethany3 opened this issue 1 year ago • 2 comments

Do you have any method for seeing which feature has the most importance/changeability power in a counterfactual? (Which features do more to move the counterfactual towards the decision boundary)

Do you have any functions within the process of generating counterfactuals that can be used to simplify this problem?

Thank you!

Also, as a separate question, I've noticed that when I use many restrictions, sometimes these are not always applied, perhaps because the algorithm struggles to find a counterfactual with so many restrictions. Do you have any rules/insights around this?

Jul 12 '22 14:07 HeyItsBethany3

Dear @HeyItsBethany3, Thanks for the question,

Do you have any method for seeing which feature has the most importance/changeability power in a counterfactual? (Which features do more to move the counterfactual towards the decision boundary)

Do you have any functions within the process of generating counterfactuals that can be used to simplify this problem?

This is an interesting idea. There is no functionality within the CounterfactualRL method itself to check this however what you could do is try using the integrated gradient method and set the initial datapoint as the basepoint and the counterfactual as the target. Perhaps this will show which features contribute most to the change in class. I've tried this out with a classifier trained on MNIST.

minst-cf-lfa

So here the blue pixels correspond to those features that most contribute to the 4 classification and the red ones are those that contribute negatively.

Out of interest what's your use case for doing this? Would be great to know more.

Also, as a separate question, I've noticed that when I use many restrictions, sometimes these are not always applied, perhaps because the algorithm struggles to find a counterfactual with so many restrictions. Do you have any rules/insights around this?

This definitely isn't intended behaviour. What happens here exactly? Does the algorithm find the relevant a counterfactual or does it fail to do so? It would be great if you could recreate a working example.

Jul 27 '22 15:07 mauicv

Hi @mauicv,

Thanks so much for your response. The use case is a credit model with over 100 variables. A big issue I've had is trying to reduce the length of explanations/sparsity to a reasonable number; most explanations included over 50 changes. So I've been looking to see if some changes are more important than others.

The way I've done this so far is by calculating by how much each feature change moves the instance towards the decision boundary. Then I've kept the important changes and discarded the insignificant ones. Interestingly, this way the number of features in an explanation has dropped significantly to less than 10, and valid counterfactuals can still be found (but they are closer to the boundary). I was surprised that the algorithm didn't do these checks already but thought maybe the algorithm was struggling because of how many features there are.

I'll definitely look into the integrated gradient method, thank you.

The algorithm gives back a valid counterfactual but it has some immutable variables changing. This is a much worse problem when using a high number of restrictions ie. specifying that many variables are immutable.

Aug 17 '22 16:08 HeyItsBethany3

alibi alibi copied to clipboard

Importance Measure for Counterfactuals with Reinforcement Learning

alibi
alibi copied to clipboard