pytorch-grad-cam BinaryClassifierOutputTarget implementation is likely wrong

I am currently trying to evaluate a binary classifier and produce visualizations. The output of my network is negative, when the prediction should be 0, and positive when the prediction should be 1. This is similar to what happens to the logits when training with logistic activation (BCEWithLogitsLoss in PyTorch).

Now, when I make use of the BinaryClassifierOutputTarget, and the prediction of my classifier is wrong, this results in a negative number: https://github.com/jacobgil/pytorch-grad-cam/blob/2183a9cbc1bd5fc1d8e134b4f3318c3b6db5671f/pytorch_grad_cam/utils/model_targets.py#L35

Apparently, at least some of the CAM techniques do not work with such negative numbers. For example, the output of a classifier that wrongly predicts the presence of a necktie in a face image looks like the one on the top, when using BinaryClassifierOutputTarget, and like the one on the bottom, when using abs(model_output) instead.

Binary Abs

I can see that the BinaryClassifierOutputTarget is not used anywhere in the repository. Hence, it seems that this has never been tested, and apparently it does not work. It would be good to fix the implementation.

Jun 13 '23 12:06 siebenkopf

Hi, thanks a lot for reporting this. Yes this is a bug - when category is 1 but the model output is negative, the output will be negative, and the CAM will effectively be for the negative category.

abs(model_output) * sign should solve this, as you suggested. It basically disregards the model output, but still has connectivity to the computation graph so the gradient from sign can flow back.

I pushed the update code and updated the pypi package.

Jun 15 '23 12:06 jacobgil

I am afraid that this is still wrong, when negative labels are applied. I am afraid that the result of the OutputTarget cannot be negative, the CAM methods do not work with negative data.

I think, we here should differentiate between two cases:

When the result of the binary classifier is a logit, which can be any number, we need to restrict ourselves to abs(model_output). As you mentioned, the direction of the output is disregarded, but the computation graph can still be accessed. Like this, we at least assure that the output is always positive.
When the result of the binary classifier is a probability between 0 and 1 (e.g., the output of the logistic function aka. Sigmoid()) then we need to return model_output for positive classes (category == 1), and 1-model_output for negative classes (category != 1). This would include the correct direction of the probability, and the output would always be positive.

I think, we would need to have two implementations for the two cases above, similarly to what we have for ClassifierOutputTarget:

BinaryOutputLogitTarget
BinaryOutputSigmoidTarget

Jun 16 '23 13:06 siebenkopf

edit: I wrote a long response but I want to be extra sure, so will get back to this:)

Jun 17 '23 16:06 jacobgil

First, thanks for the response and the discussion here, it's helpful i'm getting double checked here.

I think CAM methods can work with negative values in the output of the target function.

The target function is just what the various CAM methods try to optimize, and the way we change it ends up emphasizing different types of activations. The gradient based CAM methods (like Grad-CAM) compute gradients of that target function with respect to different activations. If the function is negated, the gradients end up multiplied by -1, which for a binary case means that it's then searching for the presence of the other category.

However I think that now I see that using abs(model_output) is actually wrong as well.

If the target_function is just abs(model_output), we will get the same CAM no matter if we query it for the first category, or the second category.

Since we want to be able to query for different categories, this doesn't work. I think what this does btw, is highlight activations that have a strong response for any of the two categories. In case the categories were dog/cat for example, if an image had both a dog and a cat, it would just highlight regions with strong activations for any.
What I suggested with abs(model_output) * sign(category) doesn't work as well. If category = 1: we get target=abs(model_output), and activations that pull to a very negative model output, would get a positive target and gradients, and would still get highlighted by the CAM for the positive category queries, so not good.

I think it looks like the original function model_output * sign(category) was actually correct after all.

Let's break it down for the two cases:

If category = 1: The target function is model_output, the CAMs are searching for signal that get an increased output in model_output, which means a larger response for the positive (second) category.

It doesn't matter here what the values in model_output were in the first place and if they were wrong or not, if the gradients are $$\frac{\partial modeloutput}{\partial w}$$ this searches for activations that pushes the model output to be more positive.

If category = 0: The target function is -1 * model_output, and the CAMs are searching for signal that get a decreased output in model_output, which means a larger response for the negative (first) category.

Side note: Something tricky here is to separate between a bad model/ and a bad target function or a bad CAM method. (edit: by bad here I mean some issue or edge case in the model/method)

In the first image it highlights the forehead. This could be because the target function is wrong. But it potentially could also be because the model (wrongly) uses that somehow, and then you actually want this in the explanation because you want to be aware.

Jun 17 '23 17:06 jacobgil

While the theory partially sounds reasonable, in practice negative numbers seem not to work properly. We have ran one experiment, where we have four different cases:

top-left: category = 1, prediction > 0
top-right category = 1, prediction < 0
bottom-left: category = 0, prediction > 0
bottom-right: category = 0, prediction < 0

When applying abs(model_output), we obtain the following, which appears to be reasonable: abs_out

When applying abs(model_output)*label, we see the bottom two images wrong: abs_out_gt

When applying model_output*label, we get the off-diagonal images wrong: out_gt

Hence, whenever the output is negative, the result is garbage. The best and most reasonable result is obtained with abs(model_output).

I totally agree that we need to differentiate between the visualization and the model. With a bad model, the visualization cannot highlight the correct regions. However, when visualizations look reasonable in one case (for one version of the output) and garbage for the other, there is a strong indication that the former is correct and the latter is not. The top visualization in my first example highlights the bottom-left of the image, and we have observed the same behavior for all similar cases.

A binary classifier has one output. Hence, Grad-CAM can only highlight the one region on which the decision is based, independent of the the actual ground truth. Therefore, I do not think that we can differentiate between visualizing category=0 and category=1.

However, it might be that your theoretical assessment is correct, and another part of the implemented pipeline is wrong, which makes negativ outputs to produce garbage.

Jun 19 '23 13:06 siebenkopf

pytorch-grad-cam pytorch-grad-cam copied to clipboard

BinaryClassifierOutputTarget implementation is likely wrong

pytorch-grad-cam
pytorch-grad-cam copied to clipboard