pytorch-cnn-visualizations
pytorch-cnn-visualizations copied to clipboard
image size problem
Hi,I find a bug in gradcam.py The Image package use w,h mode. However, the precessed image is a tensor with 1,c,h,w. So,the order of w and h need to be switched when using Image.resize.
Line 88,change cam = np.uint8(Image.fromarray(cam).resize((input_image.shape[2], input_image.shape[3]), Image.ANTIALIAS)) to cam = np.uint8(Image.fromarray(cam).resize((input_image.shape[3], input_image.shape[2]), Image.ANTIALIAS))
Hey,
Thanks for the notification! I will check/fix it by tonight.
Hello again,
Sorry for (very) late reply. I think it is correct as it is. PIL uses HWC and torch uses BCHW so HW channels are always one after another.
You can convince yourself that PIL uses HWC with
from PIL import Image
import numpy as np
im = Image.open('image.png')
im_arr = np.asarray(im)
print(im_arr.shape)
I will check the output with a model that accepts input with different H and W size and see if it is actually correct though. Thanks for the heads up.
Hi,
I believe OP is correct. My code crashes on that line with rectangular input images. Regardless of what PIL uses behind the scenes, the Image.Resize function takes shape as (W,H):

Implementing OP's fix stops the crash and results in the correct CAM (I verified it against CAMs from other packages).
Thanks for the heads up! I will have a look at it again!
hi
Indeed a problem, the solution of @zhaoxin111 helped. In addition there is another bug when using grad cam with larger image sizes. The adaptivepooling before the classification is not loaded for both alexnet and vgg. I solved it by adding following line before flattening line43:
x = self.pool.avgpool(x)