torch-cam icon indicating copy to clipboard operation
torch-cam copied to clipboard

self._normalize() got NAN...

Open noobgrow opened this issue 2 years ago • 5 comments

Bug description

Sir, when I print upsampled_a in upsampled_a = self._normalize(self.hook_a, self.hook_a.ndim - 2) I got NAN in some value. Is there any bug?

I set model to vgg19 target_layer1='features.35' fc_layer1='classifier.6'

Code snippet to reproduce the bug

nan

Error traceback

nan

Environment

nan

noobgrow avatar Apr 24 '22 14:04 noobgrow

Hi @noobgrow 👋

Thanks for opening the issue but would you mind filling the issue template properly please? 😅

This is not a fully runnable code snippet and you didn't specify your environnement :/ it's much easier for maintainers to help you efficiently with those information!

Cheers ✌️

frgfm avatar Apr 25 '22 01:04 frgfm

Bug description Sir, when I print upsampled_a in upsampled_a = self._normalize(self.hook_a, self.hook_a.ndim - 2) I got NAN in some value. Is there any bug?

I set model to vgg19 target_layer1='features.35' fc_layer1='classifier.6'

Code snippet to reproduce the bug

    args.model='vgg19'
    target_layer1='features.35'
    fc_layer1='classifier.6'
    # Pretrained imagenet model
    model = models.__dict__[args.model](pretrained=True).to(device=device)

    # Image
    if args.img.startswith('http'):
        img_path = BytesIO(requests.get(args.img).content)
    else:
        img_path = args.img
    pil_img = Image.open(img_path, mode='r').convert('RGB')

    # Preprocess image
    img_tensor = normalize(to_tensor(resize(pil_img, (224, 224))),
                           [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]).to(device=device)

    if isinstance(args.method, str):
        methods = [args.method]
    else:
        methods = [
            'CAM',
            'GradCAM', 'GradCAMpp', 'SmoothGradCAMpp',
            'ScoreCAM', 'SSCAM', 'ISCAM',
            'XGradCAM', 'LayerCAM'
        ]
        methods = [

            'GradCAM', 'GradCAMpp', 'SmoothGradCAMpp','ScoreCAM',  'ISCAM',

        ]

    # Hook the corresponding layer in the model
    cam_extractors = [cams.__dict__[name](model, enable_hooks=False,target_layer=target_layer1) for name in methods]
    # cam_extractors = [cams.__dict__[name](model, enable_hooks=False) for name in methods]

Error traceback no error.but find NAN result.

Environment nan

noobgrow avatar Apr 25 '22 05:04 noobgrow

Alright, so you used the example script, correct?

VGG architectures cannot work with base CAM for instance, they do not have global average pooling layers, and have multiple consecutive linear layers. That being said, it looks like you selected

'GradCAM', 'GradCAMpp', 'SmoothGradCAMpp','ScoreCAM',  'ISCAM'

Now, _normalize is only used in the call method of a CAM extractor, not in the constructor. The snippet that you provided is thus incomplete. Could you provide a minimal runnable snippet please? (including imports and everything, otherwise I cannot reproduce this behaviour :/ )

frgfm avatar Apr 26 '22 23:04 frgfm

Any update @noobgrow ? 🙏

frgfm avatar May 16 '22 23:05 frgfm

Hello @noobgrow :wave:

Without additional information from you, I cannot do much. Would you mind providing more details? Otherwise I'll close the issue since I cannot do anything about it :sweat_smile:

frgfm avatar Aug 01 '22 21:08 frgfm

I can confirm this NaN bug. It mostly occurs for ScoreCAM.

from torchvision.io.image import read_image
from torchvision.transforms.functional import normalize, resize, to_pil_image
import torch
from torchvision.models import resnet18
from torchcam.methods import *

model = resnet18(pretrained=True).eval()
cam_extractor = ScoreCAM(model)
# Get your input
img = read_image("border-collie.jpg")
# Preprocess it for your chosen model
input_tensor = normalize(resize(img, (224, 224)) / 255., [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

# Preprocess your data and feed it to the model
out = model(input_tensor.unsqueeze(0))
# Retrieve the CAM by passing the class index and the model output
activation_map = cam_extractor(out.squeeze(0).argmax().item(), out)

print(activation_map)

Error happens as follows:

  1. core.py --> for weight, activation in zip(weights, self.hook_a): --> variable self.hook_a contains NaN
  2. torch.nansum(weight * activation, dim=1) --> return a zero tensor
  3. self._normalize(cam) --> divide by zero error (because minimum == maximum)

For other (model, CAM) combinations, numerical instabilities may also occur.

800 images:

  • For (efficientnet_b0, SmoothGradCAMpp) 1.125% NaN.
  • For (densenet121, GradCAM) 4.625% NaN
  • For (densenet121, GradCAMpp) 2.875% NaN
  • For (densenet121, LayerCAM) 3% NaN
  • For (densenet121, SmoothGradCAMpp) 1.625% NaN
  • For (densenet121, XGradCAM) 3.25% NaN

lars-nieradzik avatar Sep 17 '22 14:09 lars-nieradzik

Hi @lars-nieradzik :wave:

Thanks for the specifics, I managed to reproduce the bug. I think I identified the problem, so I opened a PR #185 which should fix this!

The problem was:

  • ScoreCAM is a bit specific and forwards modified input tensor (some with zero variance)
  • the normalization in scoreCams was performed inplace (hence the hook_a having NaNs because they're the normalized version actually)
  • to avoid this in this specific case, I added an eps during the division part of the normalization

frgfm avatar Sep 18 '22 11:09 frgfm