CEMExplainer for pertinent positive returning no changes

Open ellieyhcheng opened this issue 5 years ago • 1 comments

I'm trying to explain a Cifar10 classification model with CEMExplainer. By messing around with the parameters I got a result for the pertinent negative, but I've been having trouble with getting any results for pertinent positives... What am I doing wrong?

mymodel = KerasClassifier(model)
explainer = CEMExplainer(mymodel)

img = x_test[0]
arg_max_iter = 1000 
arg_init_const = 10.0 
arg_b = 9 
arg_kappa = 0.05 
arg_beta = 1e-1 
arg_gamma = 100 

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  arg_mode = "PP"  # Find pertinent positive
  (adv_pp, delta_pp, info_pp) = explainer.explain_instance(np.expand_dims(img, axis=0), arg_mode, ae, arg_kappa, arg_b, arg_max_iter, arg_init_const, arg_beta, arg_gamma)

Mar 26 '20 07:03 ellieyhcheng

Hi Ellieyh, The standard CEM explainer (not CEM-MAF) in the image space mainly applies to grey scaled images where we assume the input features are normalized between -0.5 to 0.5 on which the model is trained. Given that cifar is colored although you are getting some PNs I am not sure how valid and good they are. Not getting PPs might just be an artifact of this incorrect application in the setting. CEM-MAF may be more appropriate in this setting although it requires more information like learning interpretable features along which to do the perturbation. If you decide to still apply CEM here make sure at least the inputs are normalized correctly etc before you use it. Also note that the trivial PP is just the original point so do not be alarmed if you get that for some inputs.

Mar 27 '20 12:03 sadhamanus