DiCE icon indicating copy to clipboard operation
DiCE copied to clipboard

How to early stop? The regression task runs without results when the expected output is too high

Open xueyagaga opened this issue 2 years ago • 3 comments

The output range of my regression prediction is [1, 30], while most targets are lower than 2. When I generate counterfactuals with an expected output of 10 or more for a particular instance, the generate_Counterfactuals function runs for a long time without producing a result.
It normally takes only a few seconds to generate a result (expected counterfactual output in [1,5]).
How can I make the generate_Counterfactuals function stop automatically if it's hard to find counterfactuals (similar to neural network training)

xueyagaga avatar May 18 '22 11:05 xueyagaga

The regression prediction target feature is severely left-skewed (n=244217, mean=1.11, std=0.546). The histogram of the distribution is shown below: image

Here is my code: When the desired output is in the range of [1, 5], it only takes a few seconds to run generate_Counterfactuals

CF_genetic = CF_DICE.generate_counterfactuals(query_instances,
                                              total_CFs=15,
                                              desired_range=[1.0, 5.0],
                                             features_to_vary=[IV_vary,
                                                               MV_vary])
-------> 32%|███▏      | 639/2000 [1:20:01<2:29:34,  6.59s/it]

But when the desired_range is set to a higher range, such as [10, 15], the generate_Counterfactuals keep running for hours, but no results

CF_genetic = CF_DICE.generate_counterfactuals(query_instances,
                                              total_CFs=15,
                                              desired_range=[10.0, 15.0],
                                             features_to_vary=[IV_vary,
                                                               MV_vary])
-------> Keep running and no results

xueyagaga avatar May 18 '22 12:05 xueyagaga

@xueyagaga maybe your model cannot give prediction between range [10, 15]. That's why perhaps the dice explainer is trying to generate lot more points to arrive at some counterfactual. But nevertheless the explainer should stop trying to find counterfactuals after a reasonable tries. How do you setup the dice-ml explainer?

gaugup avatar May 19 '22 05:05 gaugup

@xueyagaga maybe your model cannot give prediction between range [10, 15]. That's why perhaps the dice explainer is trying to generate lot more points to arrive at some counterfactual. But nevertheless the explainer should stop trying to find counterfactuals after a reasonable tries. How do you setup the dice-ml explainer?

Thank u for the response! The DiCE is really great! The code for setting dice-ml explainer is as below:

### The trained ML prediction model is RandomForestRegressor
d = dice_ml.Data(dataframe=dataset, continuous_features=continuous_features_housing,
                         outcome_name=outcome)
m = dice_ml.Model(model=model, backend="sklearn", model_type='regressor') 
CF_DICE = dice_ml.Dice(d, m, method="genetic") 

I'm hesitating whether I've missed some parameter settings so that the explainer won't stop.

xueyagaga avatar May 19 '22 05:05 xueyagaga