DiCE icon indicating copy to clipboard operation
DiCE copied to clipboard

Possible suggestions/improvements

Open Saladino93 opened this issue 4 years ago • 3 comments

I am starting to get more familiar with DICE. It is a fantastic library thanks a lot guys.

I wanted to ask what it is your roadmap for improvements (apart from the things mentioned in the intro docs)?

I am thinking:

  • Allow for a tolerance parameter, for when varying continuous parameters. In the feature_importance function in http://interpret.ml/DiCE/_modules/dice_ml/explainer_interfaces/explainer_base.html#ExplainerBase.local_feature_importance for example there is the np.close: I think this can be improved by using an absolute or relative tolerance, as sometimes changing a value from x to x(1+1e-5) is not really changing it. Then you exclude all the examples that are within the tolerance, given a feature. (Also, seems sometimes that the feature attribution scores are greater than one! I am not sure if this is a bug or not)
  • Allow for filtering of the cf_examples_list. I implemented something, but basically, once you generate your counterfactuals, you can set up some tabu_changes, and any allowed change in your CFs is excluded from the final count.
  • Related to this, causality. I know there is now an option for differentiable models, that it is still in its infancy. I do not know which work is currently done. But it would be something really nice. It seems to me that Bayesian frameworks, or in general any way of learning a distribution (as with a VAE), are a popular way. Do not know any others.
  • Related to learning distributions, time series counterfactuals. It was suggested here to rearrange the data to make it work. So, I think doing some sort of lagging to use time series with supervised learning and use it with DICE is not a problem. I already tried this. The problem is, how to preserve causality? And, how to be sure that your correlations in the original data are maintained? Or was someone able to use time series counterfactuals from DICE?
  • make DICE more robust. A recent paper https://arxiv.org/abs/2106.02666 suggests that DICE seems not so robust, if I understand it correctly. They give some advice to reduce this problem (adding noise to the initialization of the counterfactual search, reducing the set of features used to compute counterfactuals, and reducing the model complexity).
  • one nice thing, a reason generator, or what you call in the docs English language engine. I wrote something very basic, that says if you flip this or change this you obtain a different outcome. The problem is that I am not sure how to relate this to the importance scores, as sometimes for several variables there is a value of 1 (not to mention again that I obtained a value greater than one sometimes!)

I am sure you have already thought about some of these (e.g. English), and other things are going on. Curious to know your thoughts!

Saladino93 avatar Jul 28 '21 14:07 Saladino93

All these are great ideas @Saladino93 Thanks for sharing them. The first two can be implemented. The fourth one is on our roadmap but needs more thought for timeseries. Causality is harder and it may take some time for the research to mature to think of putting it in DiCE.

For the last two suggestions, would you like to contribute to DiCE? It will be really nice to have the last one especially on english reason generator.

amit-sharma avatar Jul 28 '21 15:07 amit-sharma

@amit-sharma Sure, I can try to help! Will write up something and share with you!

Saladino93 avatar Jul 28 '21 20:07 Saladino93

@amit-sharma would you mind sharing some thoughts on the time series counterfactuals please?

Saladino93 avatar Jul 30 '21 10:07 Saladino93