mmaction2 I got error when using grad cam for timeSformer

Hello I'm trying to visualize the grad cam of my timeSformer model I tried differents target layer name but without success The error is either : AttributeError: 'NoneType' object has no attribute 'size' or _, c, tg, _, _ = gradients.size() ValueError: not enough values to unpack (expected 5, got 3)

I'm using rawframes datset. Thanks for you help

Jul 26 '21 10:07 YNawal

@kennymckormick Could you please see my problem

Jul 28 '21 16:07 YNawal

GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?

Jul 29 '21 04:07 kennymckormick

I don't know if reshaping gradient could make sense.

Jul 29 '21 08:07 YNawal

GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?

I'll take a look in August

Jul 31 '21 05:07 irvingzhang0512

I am also facing same issue with kinetics dataset for training

Aug 26 '21 08:08 Tortoise17

Does GradCAM in mmaction2 support transformer-based models?

Jul 11 '23 03:07 yehuixie

I edit gradcam_utils.py file (Line119) by this way to support the transformer-based models.

    gradients = self.target_gradients
    activations = self.target_activations
    if self.is_recognizer2d:
        # [B*Tg, C', H', W']
        b_tg, c, _, _ = gradients.size()
        tg = b_tg // b
    else:
        grad = gradients.size()
        # implement for transformer
        if len(grad) == 3:
            _, tg, c = grad

            ########### You can edit feature_h and feature_w to support your transformer
            feature_h = int(14)
            feature_w = int(14)
            ###########

            tg /= (feature_h*feature_w)
            tg = int(tg)
            gradients = gradients.reshape(-1,tg,feature_h,feature_w,c)
            gradients = gradients.permute(0,1,4,2,3)
            activations = activations.reshape(-1,tg,feature_h,feature_w,c)
            activations = activations.permute(0,1,4,2,3)
        elif len(grad) == 5:
            _, c, tg, _, _ = grad
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)
        else:
            raise NotImplementedError("Please check grad shape")

Aug 18 '23 19:08 ZechengLi19

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

Feb 28 '24 08:02 EaaloZ

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

Maybe you can share your gradcam_utils.py file to me, that I can help you find the bug.

Feb 28 '24 08:02 ZechengLi19

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.

if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            # source shape: [B, C', Tg, H', W']
            gradients = gradients.permute(0, 4, 1, 2, 3)
            activations = activations.permute(0, 4, 1, 2, 3)
            _, c, tg, _, _ = gradients.size()
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)

Feb 28 '24 09:02 yehuixie

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.
if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            # source shape: [B, C', Tg, H', W']
            gradients = gradients.permute(0, 4, 1, 2, 3)
            activations = activations.permute(0, 4, 1, 2, 3)
            _, c, tg, _, _ = gradients.size()
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)

Thank you for your suggestions! When I set the target-layer-name to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1', the length of gradients is 3. So the permutation of 5 items failed. I have no idea what happened. Maybe I wrongly set the target-layer-name.

Feb 28 '24 09:02 EaaloZ

ZechengLi19 I edited the gradcam_utils.py file (Line119) as you have suggested. The only differences lie on line 138 hand 140, where I leave out the first element (I guess this may be the cls_token) in gradients and activiations to ensure the reshape process. Here is the modifications:

        gradients = self.target_gradients
        activations = self.target_activations
        if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
            elif len(grad) == 5:
                _, c, tg, _, _ = grad
                # target shape: [B, Tg, C', H', W']
                gradients = gradients.permute(0, 2, 1, 3, 4)
                activations = activations.permute(0, 2, 1, 3, 4)
            else:
                raise NotImplementedError("Please check grad shape")

Feb 28 '24 09:02 EaaloZ

Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件（第 119 行）。唯一的区别在于第 138 行和第 140 行，我在梯度和激活中省略了第一个元素（我猜这可能是 cls_token）以确保重塑过程。以下是修改内容：

        gradients = self.target_gradients
        activations = self.target_activations
        if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
            elif len(grad) == 5:
                _, c, tg, _, _ = grad
                # target shape: [B, Tg, C', H', W']
                gradients = gradients.permute(0, 2, 1, 3, 4)
                activations = activations.permute(0, 2, 1, 3, 4)
            else:
                raise NotImplementedError("Please check grad shape")

Does this mean the code is already working?

Feb 28 '24 10:02 ZechengLi19

Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件（第 119 行）。唯一的区别在于第 138 行和第 140 行，我在梯度和激活中省略了第一个元素（我猜这可能是 cls_token）以确保重塑过程。以下是修改内容：

        gradients = self.target_gradients
        activations = self.target_activations
        if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
            elif len(grad) == 5:
                _, c, tg, _, _ = grad
                # target shape: [B, Tg, C', H', W']
                gradients = gradients.permute(0, 2, 1, 3, 4)
                activations = activations.permute(0, 2, 1, 3, 4)
            else:
                raise NotImplementedError("Please check grad shape")

Does this mean the code is already working?

Yes，But the results are meaningless. They show no signs of special region of the input video.

Feb 29 '24 08:02 EaaloZ

mmaction2 mmaction2 copied to clipboard

I got error when using grad cam for timeSformer

mmaction2
mmaction2 copied to clipboard