pytorch-grad-cam
pytorch-grad-cam copied to clipboard
RunTimeError using custom architecture
Hi,
I am trying to run GradCam over a custom architecture I have created. The architecture is as follows:
(convnet): Sequential(
(0): Conv2d(3, 59, kernel_size=(6, 6), stride=(1, 1))
(1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(2): ReLU()
(3): Dropout(p=0.1694977121723289, inplace=False)
(4): Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc): Sequential(
(0): Linear(in_features=297419, out_features=106, bias=True)
(1): ReLU()
)
)
This architecture is an embedding network, and so I am using the [https://github.com/jacobgil/pytorch-grad-cam/blob/master/tutorials/Pixel%20Attribution%20for%20embeddings.ipynb](Pixel Attribution for Embeddings Notebook) to try and generate a heatmap. Currently, I have it set to just run on the default images for the moment.
When running the code for "Where is the car in the image", I am running into the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-9-9e1c4f908026> in <module>
19 use_cuda=False) as cam:
20 car_grayscale_cam = cam(input_tensor=input_tensor,
---> 21 targets=car_targets)[0, :]
22
23
~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/pytorch_grad_cam/base_cam.py in __call__(self, input_tensor, targets, aug_smooth, eigen_smooth)
187
188 return self.forward(input_tensor,
--> 189 targets, eigen_smooth)
190
191 def __del__(self):
~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/pytorch_grad_cam/base_cam.py in forward(self, input_tensor, targets, eigen_smooth)
82 loss = sum([target(output)
83 for target, output in zip(targets, outputs)])
---> 84 loss.backward(retain_graph=True)
85
86 # In most of the saliency attribution papers, the saliency is
~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
305 create_graph=create_graph,
306 inputs=inputs)
--> 307 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
308
309 def register_hook(self, hook):
~/.virtualenvs/pytorch-dolphin-detection/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
154 Variable._execution_engine.run_backward(
155 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 156 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
157
158
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
From a few other closed threads on this issue, it seems there is something I need to do with:
with torch.no_grad()
However I am at a complete loss as to where this needs to be, or if it is a deeper problem with my custom embedding network. Any help would be greatly appreciated. I am running python 3.6.9 and grad-cam version 1.4.5.
Code (dirs edited out):
import pickle
# A model wrapper that gets a model and returns the features before the fully connected layer.
class FeatureExtractor(torch.nn.Module):
def __init__(self, model):
super(FeatureExtractor, self).__init__()
self.model = model
self.feature_extractor = torch.nn.Sequential(*list(self.model.children())[:-1])
def __call__(self, x):
return self.feature_extractor(x)[:, :, 0, 0]
### Load the model with the optimal hyperparams previously located, based on best model checkpoint
best_params_src = $SRC
file = open(best_params_src ,'rb')
best_params = pickle.load(file)
file.close()
print(best_params)
resize_to = (300,300)
model_input_shape = [1, 3, resize_to[0], resize_to[1]]
import torch.nn as nn
### Model
log_dir = $SRC
loader = torch.load(find_best_checkpoint(log_dir, n_way = True))
model_loader = loader['Model']
# Create a blank embedding model to build upon when loading in
loaded_model = Network(nlayers = best_params['nlayers'],
hidden_size = best_params['hidden_size'],
kernel_size = best_params['kernel_size'],
dropout = best_params['dropout'],
expected_img_shape = model_input_shape,
emb_size = best_params['emb_size'])
# If multiple gpus then parallelise the model
if cuda and num_gpus > 1:
loaded_model = nn.DataParallel(loaded_model)
loaded_model.cuda()
# Load, set to eval, create FeatureExtractor
loaded_model.load_state_dict(model_loader)
loaded_model.eval()
model = FeatureExtractor(loaded_model)
car_img, car_img_float, car_tensor = get_image_from_url("https://www.wallpapersin4k.org/wp-content/uploads/2017/04/Foreign-Cars-Wallpapers-4.jpg")
cloud_img, cloud_img_float, cloud_tensor = get_image_from_url("https://th.bing.com/th/id/OIP.CmONj_pGCXg9Hq9-OxTD9gHaEo?pid=ImgDet&rs=1")
car_concept_features = model(car_tensor)[0, :]
cloud_concept_features = model(cloud_tensor)[0, :]
Image.fromarray(np.hstack((cloud_img, car_img)))
class SimilarityToConceptTarget:
def __init__(self, features):
self.features = features
def __call__(self, model_output):
cos = torch.nn.CosineSimilarity(dim=0)
return cos(model_output, self.features)
target_layers = [loaded_model.module.convnet[-1]]
car_targets = [SimilarityToConceptTarget(car_concept_features)]
cloud_targets = [SimilarityToConceptTarget(cloud_concept_features)]
# Where is the car in the image
with GradCAM(model=model,
target_layers=target_layers,
use_cuda=False) as cam:
car_grayscale_cam = cam(input_tensor=input_tensor,
targets=car_targets)[0, :] <---- ERROR HERE
car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
Image.fromarray(car_cam_image)
Update:
I fixed the above error by calling
car_concept_features.requires_grad_()
cloud_concept_features.requires_grad_()
Before
# Where is the car in the image
with GradCAM(model=model,
target_layers=target_layers,
use_cuda=False) as cam:
car_grayscale_cam = cam(input_tensor=input_tensor,
targets=car_targets)[0, :]
However I now seem to be running into the following:
An exception occurred in CAM with block: <class 'numpy.AxisError'>. Message: axis 2 is out of bounds for array of dimension 0
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-9-5cdc1cb41639> in <module>
21
22
---> 23 car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
24 Image.fromarray(car_cam_image)
NameError: name 'car_grayscale_cam' is not defined
Looking back through the closed issues on this topic, it seems to be a problem with the layers I specify for target_layers
- am I correct in thinking this?
If so, I currently select loaded_model.module.convnet[-1]
as the target, which corresponds to:
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
Any help with the above is greatly appreciated :)
Hi,
Can you please try loaded_model.module.convnet[-2]
and tell if it worked?
We need the 2D CNN activations before the pooling.
Hi @jacobgil, I tried the suggestion but the same error occurs:
target_layers = [loaded_model.module.convnet[-2]]
car_targets = [SimilarityToConceptTarget(car_concept_features)]
cloud_targets = [SimilarityToConceptTarget(cloud_concept_features)]
# Where is the car in the image
with GradCAM(model=model,
target_layers=target_layers,
use_cuda=False) as cam:
car_grayscale_cam = cam(input_tensor=input_tensor,
targets=car_targets)[0, :]
car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
Image.fromarray(car_cam_image)
Results in:
An exception occurred in CAM with block: <class 'numpy.AxisError'>. Message: axis 2 is out of bounds for array of dimension 0
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-13-784011a42087> in <module>
20 targets=car_targets)[0, :]
21
---> 22 car_cam_image = show_cam_on_image(image_float, car_grayscale_cam, use_rgb=True)
23 Image.fromarray(car_cam_image)
NameError: name 'car_grayscale_cam' is not defined
Am I correct in thinking the target_layers = [loaded_model.module.convnet[-2]]
is where you wanted the change to be made?
Hi, sorry for the late response, I was traveling..
What does target_layers look like now? What is the output shape you expect from that layer ? The CAM algorithms expect it to have the shape batch x channels x height x width. Is that what we have there ?
If the dimension is different, we will need to write a reshape_transform.
Also, what is the dimension of input_tensor ?
Hi, also sorry for my late reply.
printing target_layers
shows as: [Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))]
Using torchsummary
to get the expected output for a (3, 300, 300) image:
summary(loaded_model.module, (3, 300, 300))
gives:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 59, 295, 295] 6,431
MaxPool2d-2 [-1, 59, 147, 147] 0
ReLU-3 [-1, 59, 147, 147] 0
Dropout-4 [-1, 59, 147, 147] 0
Conv2d-5 [-1, 59, 143, 143] 87,084
MaxPool2d-6 [-1, 59, 71, 71] 0
Linear-7 [-1, 106] 31,526,520
ReLU-8 [-1, 106] 0
================================================================
Total params: 31,620,035
Trainable params: 31,620,035
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.03
Forward/backward pass size (MB): 79.83
Params size (MB): 120.62
Estimated Total Size (MB): 201.48
----------------------------------------------------------------
The model expected input is [1, 3, 300, 300]
, with images reshaped to 300, 300 before input:
print(image.shape)
gives (300, 300, 3)
.
input_tensor.shape
gives torch.Size([1, 3, 300, 300])
So I believe everything is in the shape expected, at least before passing to the model/CAM, however I may have missed something?
Hi, also sorry for my late reply.
printing
target_layers
shows as:[Conv2d(59, 59, kernel_size=(5, 5), stride=(1, 1))]
Using
torchsummary
to get the expected output for a (3, 300, 300) image:
summary(loaded_model.module, (3, 300, 300))
gives:
---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 59, 295, 295] 6,431 MaxPool2d-2 [-1, 59, 147, 147] 0 ReLU-3 [-1, 59, 147, 147] 0 Dropout-4 [-1, 59, 147, 147] 0 Conv2d-5 [-1, 59, 143, 143] 87,084 MaxPool2d-6 [-1, 59, 71, 71] 0 Linear-7 [-1, 106] 31,526,520 ReLU-8 [-1, 106] 0 ================================================================ Total params: 31,620,035 Trainable params: 31,620,035 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 1.03 Forward/backward pass size (MB): 79.83 Params size (MB): 120.62 Estimated Total Size (MB): 201.48 ----------------------------------------------------------------
The model expected input is
[1, 3, 300, 300]
, with images reshaped to 300, 300 before input:
print(image.shape)
gives(300, 300, 3)
.
input_tensor.shape
givestorch.Size([1, 3, 300, 300])
So I believe everything is in the shape expected, at least before passing to the model/CAM, however I may have missed something?
Hello! Have you sloved this problem now? I had the same problem.