DiCE icon indicating copy to clipboard operation
DiCE copied to clipboard

Time series data with DICE?

Open Saladino93 opened this issue 3 years ago • 8 comments

Hi everyone,

I am still quite new to machine learning coding. I managed to do a few things with DICE, although I was wondering how do you manage time series data for a general model?

The idea is that the counterfactual explainer has to be able to respect time ordering when generating counterfactuals. The only idea I have till now is to just post-filter counterfactuals after they are generated... Any suggestion or better ways (if this simple way is ok)?

Saladino93 avatar Jul 07 '21 13:07 Saladino93

This is a great question and depends on the causal relationships between features. We had some preliminary work on this but it may take some time to be integrated fully with DiCE.

I think your post-filtering step is simple and effective. As long as you generate enough number of CFs, post-filtering on them should return the desired CFs. In the future, we may actually implement a function for specifying these kind of constraints. To help design it, can you share what would be ideal for you? Would you want to provide the two variables and which one causes the other? Or some other kind of constraint that you have in mind?

amit-sharma avatar Jul 07 '21 16:07 amit-sharma

@amit-sharma thanks for your reply! Yes, I am aware of that work, although it does not seem to deal directly with time series. Maybe one with some manipulation could make it work.

For me it would be ideal to choose the constraints, plus respect causality among features for a fixed time, plus causality along time.

I do not have for now specific constraints in mind (but if I come I will let you know).

For now, I was running this example https://machinelearningmastery.com/feature-selection-time-series-forecasting-python/ , to see how time series prediction can be done in Python, and then I ran Dice on top of it.

Basically, after some feature engineering for time series data, you have [t-12, t-11, ...., t-2, t-1, t] lag variables, and you want to predict t from the other ones. After training I run Dice for counterfactuals to obtain what I could change in my lagged variables (although this probably is more a what-if analysis, rather than what I can do, in general, as one changes variables of the past).

Here is an example

image

The big picture would be to run on stores data for example, like here https://www.kaggle.com/kyakovlev/m5-lags-features/notebook, although I have still to try there.

One last thing. For now, I will generate tons of CFs and check which ones make sense when I filter (I have to develop some automatic way). Although, how much is safe to run lots, like 10 or 20, of CFs? Did you see this paper on robustness of generation of CFs https://arxiv.org/pdf/2106.02666v1.pdf (Counterfactual Explanations Can Be Manipulated)? I might open another issue to discuss this, seems quite important.

Also, I think I will code the necessity and sufficiency metrics that you implemented in a recent paper of yours. Lots of stuff to discuss here, but step by step.

Thanks, and hopefully I will have something more concrete in the future.

Saladino93 avatar Jul 08 '21 16:07 Saladino93

@amit-sharma sorry to disturb you again, but would you have any idea if DICE will support time series, or how to do this?

Saladino93 avatar Jul 16 '21 09:07 Saladino93

Hi @amit-sharma is there any update on having support for time series?

shreyakhandelwal07 avatar Oct 05 '22 11:10 shreyakhandelwal07

Hi @amit-sharma. Was thinking if we should introduce feature lags as new features to deal with time series data. What I mean is if we have feature A,B, C, do you think we should add A, B, C, A_lag1, B_lag1.... until a certain lag and then perform CF reasoning?

Or please let us know if you have any updates?

asha24choudhary avatar Feb 14 '24 11:02 asha24choudhary

Or was wondering should we use a model in the model parameter which takes into account temporal dependencies?

asha24choudhary avatar Feb 14 '24 12:02 asha24choudhary

yeah, the feature lags is the best solution currently in DiCE.

amit-sharma avatar Feb 15 '24 07:02 amit-sharma

Oh thank you :)

asha24choudhary avatar Feb 15 '24 09:02 asha24choudhary