alibi Counterfactual prototype and categorical variables

Counterfactual prototype and categorical variables

Open prabhathur opened this issue 3 years ago • 14 comments

Hello, I am trying to generate counterfactuals for a dataset which contains both categorical and continuous variables. The categorical variables are mostly binary in nature (along with some non ordinal ones). I am trying to run the algorithm with one hot encoding instead of considering variable as ordinal (which probably leads to multi collinearity). When I generate counterfactuals the changes in categorical values suggested are counterintuitive and go against the Shap interpretation (i.e. say value of 1 for a categorical variable is more favorable towards class 1 but for some instances the algorithm suggests me to change value from existing 1 to 0, while there are no instances for which value for that categorical value was changed from 0 to 1.). So if I try to feed the categorical variable ohe mapping (created by ord_to_ohe function) by using dummy variables (to get rid of the multicollinearity, so for a binary column its as good as keeping one column) rather than onehotencoding, which leads to further errors while running the algorithms ( Assertion errors- line 704 of cfproto.py). Is there a way to properly handle the binary categorical variables?

Mar 16 '21 06:03 prabhathur

The counterintuitive examples might be due to them being out-of-distribution counterfactuals, what are the settings of the explainer? Do you use an autoencoder or kd-trees for prototypes?

I'm not sure I follow the steps that lead to the error, do you have an example of the data and the ohe mapping that results in the error?

Mar 16 '21 17:03 jklaise

I tried both with kdtree and without kdtree argument. The results from both were same. The counter intuitiveness comes only from the categorical variables.

Since it felt like multi collinearity was a problem I dropped one hot encoding and just kept the binary variables as is (with no embedding layers in the model) and removed the ohe flag while initialiazing CounterFactualProto. Then I tried to generate counterfactual for an instance. (The instance had the 5th, 7th and 8th variable's value as 1.

Input instance- [0, 0.0, 0.0, 0, 1, 0.0, 1, 1, 0, -10.0, -10.0, -4.615384615384615, -9.84732824427481, -7.5, -3.333333333333332, -6.8761212121212125, -10.0, 8.34557011430786, -1.617647058823529]

Here is the verbose output for the generation.

[[0.6817306 0.3182694]] Predicted class: [0] Target classes: [1]

Iteration: 0; Const: 1.0 Loss total: 0.363, loss attack: 0.363 L2: 0.000, L1: 0.000, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -0.672/0.154 Gradient graph mean/abs mean: -0.171/0.201

Iteration: 100; Const: 1.0 Loss total: 0.403, loss attack: 0.363 L2: 0.036, L1: 0.321, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -0.186/0.197 Gradient graph mean/abs mean: -0.010/0.069

Iteration: 200; Const: 1.0 Loss total: 0.403, loss attack: 0.363 L2: 0.036, L1: 0.322, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -0.186/0.197 Gradient graph mean/abs mean: -0.010/0.069

Iteration: 300; Const: 1.0 Loss total: 0.403, loss attack: 0.363 L2: 0.036, L1: 0.323, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -0.186/0.197 Gradient graph mean/abs mean: -0.010/0.069

Iteration: 400; Const: 1.0 Loss total: 0.403, loss attack: 0.363 L2: 0.036, L1: 0.324, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -0.186/0.197 Gradient graph mean/abs mean: -0.010/0.069

Iteration: 0; Const: 10.0 Loss total: 3.582, loss attack: 3.571 L2: 0.008, L1: 0.230, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -6.716/1.544 Gradient graph mean/abs mean: -1.713/2.012

Iteration: 100; Const: 10.0 Loss total: 3.842, loss attack: 3.635 L2: 0.198, L1: 0.941, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -1.172/0.426 Gradient graph mean/abs mean: -0.373/0.456

Iteration: 200; Const: 10.0 Loss total: 3.810, loss attack: 3.635 L2: 0.167, L1: 0.824, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -1.210/0.426 Gradient graph mean/abs mean: -0.385/0.468

Iteration: 300; Const: 10.0 Loss total: 3.737, loss attack: 3.635 L2: 0.096, L1: 0.624, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -1.391/0.426 Gradient graph mean/abs mean: -0.406/0.488

Iteration: 400; Const: 10.0 Loss total: 3.681, loss attack: 3.635 L2: 0.043, L1: 0.398, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.68, max non target proba: 0.32 Gradient graph min/max: -1.722/0.426 Gradient graph mean/abs mean: -0.429/0.512

Iteration: 0; Const: 100.0 Loss total: 84.917, loss attack: 83.723 L2: 1.160, L1: 3.322, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.92, max non target proba: 0.08 Gradient graph min/max: -67.161/15.444 Gradient graph mean/abs mean: -17.131/20.125

New best counterfactual found!

Iteration: 100; Const: 100.0 Loss total: 13.949, loss attack: 0.000 L2: 13.836, L1: 11.316, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.49, max non target proba: 0.51 Gradient graph min/max: -0.891/5.469 Gradient graph mean/abs mean: 1.053/1.230

New best counterfactual found!

Iteration: 200; Const: 100.0 Loss total: 14.855, loss attack: 0.000 L2: 14.736, L1: 11.912, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.35, max non target proba: 0.65 Gradient graph min/max: -86.065/33.855 Gradient graph mean/abs mean: -19.094/27.745

New best counterfactual found!

Iteration: 300; Const: 100.0 Loss total: 13.697, loss attack: 0.000 L2: 13.585, L1: 11.200, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.39, max non target proba: 0.61 Gradient graph min/max: -85.958/33.807 Gradient graph mean/abs mean: -19.147/27.753

Iteration: 400; Const: 100.0 Loss total: 12.037, loss attack: 0.000 L2: 11.939, L1: 9.854, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.47, max non target proba: 0.53 Gradient graph min/max: -2.753/4.750 Gradient graph mean/abs mean: 0.771/1.063

Iteration: 0; Const: 55.0 Loss total: 43.017, loss attack: 42.610 L2: 0.389, L1: 1.848, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.89, max non target proba: 0.11 Gradient graph min/max: -36.939/8.494 Gradient graph mean/abs mean: -9.422/11.069

Iteration: 100; Const: 55.0 Loss total: 15.772, loss attack: 0.000 L2: 15.649, L1: 12.348, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.47, max non target proba: 0.53 Gradient graph min/max: -40.403/10.710 Gradient graph mean/abs mean: -9.581/12.927

Iteration: 200; Const: 55.0 Loss total: 15.547, loss attack: 0.000 L2: 15.428, L1: 11.855, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.46, max non target proba: 0.54 Gradient graph min/max: -40.410/10.723 Gradient graph mean/abs mean: -9.623/12.970

Iteration: 300; Const: 55.0 Loss total: 14.397, loss attack: 0.000 L2: 14.284, L1: 11.240, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.49, max non target proba: 0.51 Gradient graph min/max: -0.863/5.473 Gradient graph mean/abs mean: 1.045/1.215

Iteration: 400; Const: 55.0 Loss total: 14.611, loss attack: 0.000 L2: 14.500, L1: 11.111, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.49, max non target proba: 0.51 Gradient graph min/max: -0.871/5.542 Gradient graph mean/abs mean: 1.027/1.196

Iteration: 0; Const: 32.5 Loss total: 10.853, loss attack: 10.716 L2: 0.127, L1: 1.035, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.66, max non target proba: 0.34 Gradient graph min/max: -21.827/5.019 Gradient graph mean/abs mean: -5.567/6.541

Iteration: 100; Const: 32.5 Loss total: 30.274, loss attack: 24.274 L2: 5.927, L1: 7.344, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.87, max non target proba: 0.13 Gradient graph min/max: -9.189/2.644 Gradient graph mean/abs mean: -1.923/2.411

Iteration: 200; Const: 32.5 Loss total: 29.335, loss attack: 18.833 L2: 10.404, L1: 9.820, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.79, max non target proba: 0.21 Gradient graph min/max: -13.307/3.412 Gradient graph mean/abs mean: -3.042/3.844

Iteration: 300; Const: 32.5 Loss total: 14.938, loss attack: 0.000 L2: 14.821, L1: 11.679, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.49, max non target proba: 0.51 Gradient graph min/max: -22.056/6.825 Gradient graph mean/abs mean: -5.280/7.256

Iteration: 400; Const: 32.5 Loss total: 14.731, loss attack: 0.000 L2: 14.619, L1: 11.249, loss AE: 0.000 Loss proto: 0.000 Target proba: 0.50, max non target proba: 0.50 Gradient graph min/max: -20.059/6.696 Gradient graph mean/abs mean: -4.071/6.344

Generated CF, accessed through explanation.cf['X'] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. -7.7922173 -10. -4.6153846 -9.143833 -7.5 -2.5690596 -6.5932193 -10. 8.551834 -2.8263392]]

The final CF has changed the 5th, 7th and 8th variable from 1 to 0, with the model prediction for the CF being [0.49967718, 0.5003228 ]. But if in the CF I substitute the 5th, 7th and 8th variable back to 1 and then get the model prediction for this I get [0.11331835, 0.8866817 ].

modified_cf=np.array([0.,0.,0.,0.,1.,0.,1.,1.,0.,-7.7922173,-10.,-4.6153846,-9.143833,-7.5,-2.5690596,-6.5932193,-10.,8.551834,-2.8263392]).reshape(1,-1) model.predict(modified_cf)

array([[0.11331835, 0.8866817 ]], dtype=float32)

This indicates that the variables are contributing positively to the prediction towards class 1, so why exactly is the algorithm forcing a change which isn't actually beneficial. As seen in the verbose output the prototype loss is 0 (as I have not used kdtree or an ae).

Mar 17 '21 07:03 prabhathur

Thanks for providing the example. Since it produces an output is it fair to assume you don't have the issues with the AssertionError anymore?

That is an interesting example. What are your settings for feature_ranges, standardize_cat_vars, update_feature_range? There could be many reasons why the method prefers to return some changes but not others, e.g.

The manual substitution you did might be a more out-of-distribution counterfactual than the one actually returned. Although, because no kdtree or ae was used, the prototype loss would be 0 and therefore this should not be a factor in this case
Different scales of features will have an impact on the loss, for optimal results, all variables should be on the same scale. By providing feature_ranges for each feature, both numerical and categorical values (after embedding in numerical space). are also constrained to this range. Since a priori you wouldn't know what numerical values correspond to categorical feature embeddings, passing update_feature_ranges=True would determine this and update it automatically (as long as some feature_ranges are passed to the explainer initially).
Different coefficient values beta for the sparsity loss on L1 may have an impact on sparsity of returned CF

Mar 17 '21 16:03 jklaise

Thanks for providing the example. Since it produces an output is it fair to assume you don't have the issues with the AssertionError anymore?

The assertion error was coming when I was trying onehot encoding and dropping one of the one hot encoded column to avoid multicollinearity (so if there is a categorical feature with 5 unique values by dropping one I would be left with 4 columns). The cat_vars_ohe dictionary was slightly modified due to this which later gave an assertion error while using fit(). To avoid this I removed the non ordinal variables I proceeded without onehot encoding.

That is an interesting example. What are your settings for feature_ranges, standardize_cat_vars, update_feature_range?

For the feature range I am providing the min and max values just like in the documentation. The standardize_cat_vars and update_feature_range use default arguments.

Different scales of features will have an impact on the loss, for optimal results, all variables should be on the same scale. By providing feature_ranges for each feature, both numerical and categorical values (after embedding in numerical space). are also constrained to this range. Since a priori you wouldn't know what numerical values correspond to categorical feature embeddings, passing update_feature_ranges=True would determine this and update it automatically (as long as some feature_ranges are passed to the explainer initially).

I was intially scaling my continuous values to be in the range -1 to 1. In this case the categorical features were not changing features in a large majority of the CF. I tried -2 to 2 and -3 to 3 in which case the categorical values started changing a bit more in CF (although counterintuitively). In the example provided above the continuous features are scaled between -10 and 10. My guess was that for the binary variables the MVDM distance turned out to be nearly 0.4 to 0.5 which seemed like a large change because of which it was preferring continuous variables. Is this correct?

Different coefficient values beta for the sparsity loss on L1 may have an impact on sparsity of returned CF

The above example is based on beta value of 0.01. I will try to increase it and check but it still feels strange that the algorithm changes the categorical features from 1 to 0 which not only increases the l1 and l2 distance but also has a negative effect on the predicted probability of the target class.

Mar 18 '21 09:03 prabhathur

The assertion error was coming when I was trying onehot encoding and dropping one of the one hot encoded column to avoid multicollinearity (so if there is a categorical feature with 5 unique values by dropping one I would be left with 4 columns). The cat_vars_ohe dictionary was slightly modified due to this which later gave an assertion error while using fit(). To avoid this I removed the non ordinal variables I proceeded without onehot encoding.

I understand, I think we should study the case and extend the method for one-hot encoding that drops a column as this may be rather common, especially with generalized linear models.

For the feature range I am providing the min and max values just like in the documentation. The standardize_cat_vars and update_feature_range use default arguments.

So, one value for all features? E.g. feature_range = (-10, 10)?

I was intially scaling my continuous values to be in the range -1 to 1. In this case the categorical features were not changing features in a large majority of the CF. I tried -2 to 2 and -3 to 3 in which case the categorical values started changing a bit more in CF (although counterintuitively). In the example provided above the continuous features are scaled between -10 and 10. My guess was that for the binary variables the MVDM distance turned out to be nearly 0.4 to 0.5 which seemed like a large change because of which it was preferring continuous variables. Is this correct?

This makes sense, the change in the MVDM distance would be large compared to numerical features scaled to (-1, 1) but small if they are scaled across a longer interval.

The above example is based on beta value of 0.01. I will try to increase it and check but it still feels strange that the algorithm changes the categorical features from 1 to 0 which not only increases the l1 and l2 distance but also has a negative effect on the predicted probability of the target class.

There is currently a limitation in the algorithm in that it does not consider the actual predicted probability but finishes whenever the argmax of the probability vector is a different class than the original. This would explain why the returned counterfactuals are classified with confidence just above p=0.50. We need to schedule some time to extend the method to work with prescribed probabilities (the vanilla CounterFactual method does this, but does not support categorical variables).

Mar 18 '21 18:03 jklaise

I understand, I think we should study the case and extend the method for one-hot encoding that drops a column as this may be rather common, especially with generalized linear models.

This would be great.

So, one value for all features? E.g. feature_range = (-10, 10)?

Yes the continuous features were scaled to (-10,10), while categorical it picks from the dataset.

There is currently a limitation in the algorithm in that it does not consider the actual predicted probability but finishes whenever the argmax of the probability vector is a different class than the original. This would explain why the returned counterfactuals are classified with confidence just above p=0.50. We need to schedule some time to extend the method to work with prescribed probabilities (the vanilla CounterFactual method does this, but does not support categorical variables).

While having control on the predicted probability would be great I am right now happy with it being just above 0.5. The only reason I bought up the probability was to indicate that by changing the categorical feature value it was decreasing the probability (along with the increased penalty towards the l1 and l2 distances). Why is the algorithm changing the 1s into 0s in the first place when it has a negative effect on the probability. Isn't that counterintuitive?

Mar 22 '21 13:03 prabhathur

While having control on the predicted probability would be great I am right now happy with it being just above 0.5. The only reason I bought up the probability was to indicate that by changing the categorical feature value it was decreasing the probability (along with the increased penalty towards the l1 and l2 distances). Why is the algorithm changing the 1s into 0s in the first place when it has a negative effect on the probability. Isn't that counterintuitive?

It probably doesn't have a negative effect on the probability at the point of the optimization where it was changed and even if it did, it could happen because the loss being minimized is a linear combination of the prediction on the CF and sparsity (and prototype + AE losses if those are turned on), so it wouldn't be too surprising.

P.S. I keep accidentally editing your posts instead of writing replies, apologies!

Mar 22 '21 13:03 jklaise

It probably doesn't have a negative effect on the probability at the point of the optimization where it was changed and even if it did, it could happen because the loss being minimized is a linear combination of the prediction on the CF and sparsity (and prototype + AE losses if those are turned on), so it wouldn't be too surprising.

Changing for one or two instances maybe not surprising. But I generated for 5000 instances. It changed one of the categorical value from 1 to 0 in about 300 of those with not even one instance where it changed from 0 to 1 (despite SHAP indicating changing from 0 to 1 is beneficial).

Mar 22 '21 14:03 prabhathur

Changing for one or two instances maybe not surprising. But I generated for 5000 instances. It changed one of the categorical value from 1 to 0 in about 300 of those with not even one instance where it changed from 0 to 1 (despite SHAP indicating changing from 0 to 1 is beneficial).

It's probably because of the sparsity loss being in effect. Unfortunately, with the current implementation it is not possible to fully turn it off to compare if just having the prediction loss would lead to different behaviour...

Mar 22 '21 15:03 jklaise

Hello, I just have a small update. In order to see how the categorical variables behave and eliminate influence of continuous variables I ran the generation process for 2 sets of identical data containing only categorical variables with one difference. The first set contains 12 binary categorical variables (that I was using previously in the above example). The second contains the same 12 categorical variables with the 1s and 0s flipped. For the 1st set all the variables being 1 had a positive effect on the required target class and in the 2nd since we flipped the 1s and 0s, now the 0s had a positive effect on the required target. It turns out that for the first set no counterfactuals were getting generated while for the 2nd set it was able to generate counterfactuals for all the instances I fed it with (more than 3500 with no failure). The only difference between the 2 datasets is a flipping of 1s and 0s. In the 1st set the counterfactuals would require features to be changed from 0 to 1 and for 2nd the counterfactuals would require features to be changed from 1 to 0. For some reason it seems to be very comfortable doing the 2nd change but it doesn't change any feature 0s to 1s. The same pattern was visible in the above examples with mixed data. While I haven't tried extensively, ordinal variables which are not binary were getting changed from 0 to other values, so this seems more related to binary variables.

Mar 26 '21 11:03 prabhathur

@prabhathur hard to say exactly what the issue is without a minimal reproducible example. Was the classifier re-trained after the labels were flipped?

Mar 26 '21 14:03 jklaise

Was the classifier re-trained after the labels were flipped?

I was basically running it on 2 separate notebooks and the model was trained after flipping itself.

hard to say exactly what the issue is without a minimal reproducible example.

I tried the generating it on a simpler dataset Its showing a similar behavior here as well. Attaching 2 notebook files, one with the features value flipped and another without. Notebooks.zip

Mar 28 '21 17:03 prabhathur

@prabhathur I've run the two notebooks and can confirm that I'm observing similar behaviour. I will have to dig deeper into the logic to understand what may be causing this.

Apr 14 '21 16:04 jklaise

@prabhathur I believe I've found the source of the issue and it's a bug on our part, it's actually a few inter-related issues and lack of documentation for which I will open new issues to fix them.

Basically, the culprit is setting rng_shape to (0,1) (from which feature_range is derived). When it comes to setting feature_range, this is supposed to be the numerical range for every feature and when it comes to categorical variables it's supposed to be the numerical range after the embedding into numerical space is done by e.g. mvdm.

Of course, you can't know a priori what this range will be, so we have a default behaviour calling fit to update_feature_range=True so it is taken care of. Unfortunately, because we use TensorFlow 1.x constructs internally, this does not actually happen as the tensorflow graph holds onto the old ranges (in (0,1) in this case). It is not clear whether using TensorFlow 1.x it is possible to easily update this range after the computational graph has been defined.

However, using default options to fit, the categorical variables are embedded into numerical space in the range (-1, 1) (this is currently undocumented, also it can be changed by passing standardize_cat_vars=True in which case the range is not predictable but the embedded values have mean 0 and std 1).

Therefore, an easy workaround for your use case is to use rng_shape=(-1, 1), using this in both cases (original and flipped 0s and 1s) counterfactuals will be found. The reason why they were not found previously but only for one of the cases was that we were clipping the ranges in (0, 1), but the embedded values for binary categorical variables are (-x, x) where |x|<=1, so we were always clipping the perturbations to 0 which failed to find counterfactuals in the original case.

Now there are a few more considerations, if you have a mix of continous and cateogorical variables, it is currently not straightforward to set the feature_range appropriately for both because of the above reasons.

Furthermore, there is a bug in leaving the default feature_range ((-10000000000.0, 10000000000.0))) when categorical variables are present as this tuple needs to be converted to arrays internally before doing clipping.

Apr 15 '21 10:04 jklaise

alibi alibi copied to clipboard

Counterfactual prototype and categorical variables

alibi
alibi copied to clipboard