mmaction2
mmaction2 copied to clipboard
I got error when using grad cam for timeSformer
Hello I'm trying to visualize the grad cam of my timeSformer model I tried differents target layer name but without success The error is either : AttributeError: 'NoneType' object has no attribute 'size' or _, c, tg, _, _ = gradients.size() ValueError: not enough values to unpack (expected 5, got 3)
I'm using rawframes datset. Thanks for you help
@kennymckormick Could you please see my problem
GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?
I don't know if reshaping gradient could make sense.
GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?
I'll take a look in August
I am also facing same issue with kinetics dataset for training
Does GradCAM in mmaction2 support transformer-based models?
I edit gradcam_utils.py file (Line119) by this way to support the transformer-based models.
gradients = self.target_gradients
activations = self.target_activations
if self.is_recognizer2d:
# [B*Tg, C', H', W']
b_tg, c, _, _ = gradients.size()
tg = b_tg // b
else:
grad = gradients.size()
# implement for transformer
if len(grad) == 3:
_, tg, c = grad
########### You can edit feature_h and feature_w to support your transformer
feature_h = int(14)
feature_w = int(14)
###########
tg /= (feature_h*feature_w)
tg = int(tg)
gradients = gradients.reshape(-1,tg,feature_h,feature_w,c)
gradients = gradients.permute(0,1,4,2,3)
activations = activations.reshape(-1,tg,feature_h,feature_w,c)
activations = activations.permute(0,1,4,2,3)
elif len(grad) == 5:
_, c, tg, _, _ = grad
# target shape: [B, Tg, C', H', W']
gradients = gradients.permute(0, 2, 1, 3, 4)
activations = activations.permute(0, 2, 1, 3, 4)
else:
raise NotImplementedError("Please check grad shape")
@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
grad = gradients.size()
# implement for transformer
if len(grad) == 3:
_, tg, c = grad
########### You can edit feature_h and feature_w to support your transformer
feature_h = int(14)
feature_w = int(14)
###########
tg /= (feature_h * feature_w)
tg = int(tg)
gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
gradients = gradients.permute(0, 1, 4, 2, 3)
activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
activations = activations.permute(0, 1, 4, 2, 3)
@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3)
Maybe you can share your gradcam_utils.py file to me, that I can help you find the bug.
@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3)
I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.
if self.is_recognizer2d:
# [B*Tg, C', H', W']
b_tg, c, _, _ = gradients.size()
tg = b_tg // b
else:
# source shape: [B, C', Tg, H', W']
gradients = gradients.permute(0, 4, 1, 2, 3)
activations = activations.permute(0, 4, 1, 2, 3)
_, c, tg, _, _ = gradients.size()
# target shape: [B, Tg, C', H', W']
gradients = gradients.permute(0, 2, 1, 3, 4)
activations = activations.permute(0, 2, 1, 3, 4)
@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3)
I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.
if self.is_recognizer2d: # [B*Tg, C', H', W'] b_tg, c, _, _ = gradients.size() tg = b_tg // b else: # source shape: [B, C', Tg, H', W'] gradients = gradients.permute(0, 4, 1, 2, 3) activations = activations.permute(0, 4, 1, 2, 3) _, c, tg, _, _ = gradients.size() # target shape: [B, Tg, C', H', W'] gradients = gradients.permute(0, 2, 1, 3, 4) activations = activations.permute(0, 2, 1, 3, 4)
Thank you for your suggestions! When I set the target-layer-name to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1', the length of gradients is 3. So the permutation of 5 items failed. I have no idea what happened. Maybe I wrongly set the target-layer-name.
ZechengLi19 I edited the gradcam_utils.py file (Line119) as you have suggested. The only differences lie on line 138 hand 140, where I leave out the first element (I guess this may be the cls_token) in gradients and activiations to ensure the reshape process. Here is the modifications:
gradients = self.target_gradients
activations = self.target_activations
if self.is_recognizer2d:
# [B*Tg, C', H', W']
b_tg, c, _, _ = gradients.size()
tg = b_tg // b
else:
grad = gradients.size()
# implement for transformer
if len(grad) == 3:
_, tg, c = grad
########### You can edit feature_h and feature_w to support your transformer
feature_h = int(14)
feature_w = int(14)
###########
tg /= (feature_h * feature_w)
tg = int(tg)
gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
gradients = gradients.permute(0, 1, 4, 2, 3)
activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
activations = activations.permute(0, 1, 4, 2, 3)
elif len(grad) == 5:
_, c, tg, _, _ = grad
# target shape: [B, Tg, C', H', W']
gradients = gradients.permute(0, 2, 1, 3, 4)
activations = activations.permute(0, 2, 1, 3, 4)
else:
raise NotImplementedError("Please check grad shape")
Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件(第 119 行)。唯一的区别在于第 138 行和第 140 行,我在梯度和激活中省略了第一个元素(我猜这可能是 cls_token)以确保重塑过程。 以下是修改内容:
gradients = self.target_gradients activations = self.target_activations if self.is_recognizer2d: # [B*Tg, C', H', W'] b_tg, c, _, _ = gradients.size() tg = b_tg // b else: grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3) elif len(grad) == 5: _, c, tg, _, _ = grad # target shape: [B, Tg, C', H', W'] gradients = gradients.permute(0, 2, 1, 3, 4) activations = activations.permute(0, 2, 1, 3, 4) else: raise NotImplementedError("Please check grad shape")
Does this mean the code is already working?
Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件(第 119 行)。唯一的区别在于第 138 行和第 140 行,我在梯度和激活中省略了第一个元素(我猜这可能是 cls_token)以确保重塑过程。 以下是修改内容:
gradients = self.target_gradients activations = self.target_activations if self.is_recognizer2d: # [B*Tg, C', H', W'] b_tg, c, _, _ = gradients.size() tg = b_tg // b else: grad = gradients.size() # implement for transformer if len(grad) == 3: _, tg, c = grad ########### You can edit feature_h and feature_w to support your transformer feature_h = int(14) feature_w = int(14) ########### tg /= (feature_h * feature_w) tg = int(tg) gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) gradients = gradients.permute(0, 1, 4, 2, 3) activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c) activations = activations.permute(0, 1, 4, 2, 3) elif len(grad) == 5: _, c, tg, _, _ = grad # target shape: [B, Tg, C', H', W'] gradients = gradients.permute(0, 2, 1, 3, 4) activations = activations.permute(0, 2, 1, 3, 4) else: raise NotImplementedError("Please check grad shape")
Does this mean the code is already working?
Yes,But the results are meaningless. They show no signs of special region of the input video.