mmaction2 icon indicating copy to clipboard operation
mmaction2 copied to clipboard

I got error when using grad cam for timeSformer

Open YNawal opened this issue 3 years ago • 14 comments

Hello I'm trying to visualize the grad cam of my timeSformer model I tried differents target layer name but without success The error is either : AttributeError: 'NoneType' object has no attribute 'size' or _, c, tg, _, _ = gradients.size() ValueError: not enough values to unpack (expected 5, got 3)

I'm using rawframes datset. Thanks for you help

YNawal avatar Jul 26 '21 10:07 YNawal

@kennymckormick Could you please see my problem

YNawal avatar Jul 28 '21 16:07 YNawal

GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?

kennymckormick avatar Jul 29 '21 04:07 kennymckormick

I don't know if reshaping gradient could make sense.

YNawal avatar Jul 29 '21 08:07 YNawal

GradCAM hasn't supported the transformer-based models yet. @irvingzhang0512 do you have some time to look at this problem?

I'll take a look in August

irvingzhang0512 avatar Jul 31 '21 05:07 irvingzhang0512

I am also facing same issue with kinetics dataset for training

Tortoise17 avatar Aug 26 '21 08:08 Tortoise17

Does GradCAM in mmaction2 support transformer-based models?

yehuixie avatar Jul 11 '23 03:07 yehuixie

I edit gradcam_utils.py file (Line119) by this way to support the transformer-based models.

    gradients = self.target_gradients
    activations = self.target_activations
    if self.is_recognizer2d:
        # [B*Tg, C', H', W']
        b_tg, c, _, _ = gradients.size()
        tg = b_tg // b
    else:
        grad = gradients.size()
        # implement for transformer
        if len(grad) == 3:
            _, tg, c = grad

            ########### You can edit feature_h and feature_w to support your transformer
            feature_h = int(14)
            feature_w = int(14)
            ###########

            tg /= (feature_h*feature_w)
            tg = int(tg)
            gradients = gradients.reshape(-1,tg,feature_h,feature_w,c)
            gradients = gradients.permute(0,1,4,2,3)
            activations = activations.reshape(-1,tg,feature_h,feature_w,c)
            activations = activations.permute(0,1,4,2,3)
        elif len(grad) == 5:
            _, c, tg, _, _ = grad
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)
        else:
            raise NotImplementedError("Please check grad shape")

ZechengLi19 avatar Aug 18 '23 19:08 ZechengLi19

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

EaaloZ avatar Feb 28 '24 08:02 EaaloZ

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

Maybe you can share your gradcam_utils.py file to me, that I can help you find the bug.

ZechengLi19 avatar Feb 28 '24 08:02 ZechengLi19

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.

if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            # source shape: [B, C', Tg, H', W']
            gradients = gradients.permute(0, 4, 1, 2, 3)
            activations = activations.permute(0, 4, 1, 2, 3)
            _, c, tg, _, _ = gradients.size()
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)

yehuixie avatar Feb 28 '24 09:02 yehuixie

@ZechengLi19 @ZechengLi19 Thank you very much! I have tried your modification but something went wrong. First, I suppose that feature_h = img_size/patch_size. Second, the param, target-layer-name, is set to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1'. Under these settings, the resulting video did not exhibit any distinct or notable regions. I'd appreciate it very much if you'd like to provide me some suggestions!

            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)

I have modified line 126 of the "gradcam_utils.py" file and solved the problem by adding two lines of code. Maybe you can try this method. By the way, I found this method in this github repository https://github.com/MartinXM/TPS.

if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            # source shape: [B, C', Tg, H', W']
            gradients = gradients.permute(0, 4, 1, 2, 3)
            activations = activations.permute(0, 4, 1, 2, 3)
            _, c, tg, _, _ = gradients.size()
            # target shape: [B, Tg, C', H', W']
            gradients = gradients.permute(0, 2, 1, 3, 4)
            activations = activations.permute(0, 2, 1, 3, 4)

Thank you for your suggestions! When I set the target-layer-name to be 'backbone/transformer_layers/layers/11/ffns/0/layers/1', the length of gradients is 3. So the permutation of 5 items failed. I have no idea what happened. Maybe I wrongly set the target-layer-name.

EaaloZ avatar Feb 28 '24 09:02 EaaloZ

ZechengLi19 I edited the gradcam_utils.py file (Line119) as you have suggested. The only differences lie on line 138 hand 140, where I leave out the first element (I guess this may be the cls_token) in gradients and activiations to ensure the reshape process. Here is the modifications:

        gradients = self.target_gradients
        activations = self.target_activations
        if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
            elif len(grad) == 5:
                _, c, tg, _, _ = grad
                # target shape: [B, Tg, C', H', W']
                gradients = gradients.permute(0, 2, 1, 3, 4)
                activations = activations.permute(0, 2, 1, 3, 4)
            else:
                raise NotImplementedError("Please check grad shape")

EaaloZ avatar Feb 28 '24 09:02 EaaloZ

Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件(第 119 行)。唯一的区别在于第 138 行和第 140 行,我在梯度和激活中省略了第一个元素(我猜这可能是 cls_token)以确保重塑过程。 以下是修改内容:

        gradients = self.target_gradients
        activations = self.target_activations
        if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
            elif len(grad) == 5:
                _, c, tg, _, _ = grad
                # target shape: [B, Tg, C', H', W']
                gradients = gradients.permute(0, 2, 1, 3, 4)
                activations = activations.permute(0, 2, 1, 3, 4)
            else:
                raise NotImplementedError("Please check grad shape")

Does this mean the code is already working?

ZechengLi19 avatar Feb 28 '24 10:02 ZechengLi19

Ze ChengLi19 我按照您的建议编辑了 gradcam_utils.py 文件(第 119 行)。唯一的区别在于第 138 行和第 140 行,我在梯度和激活中省略了第一个元素(我猜这可能是 cls_token)以确保重塑过程。 以下是修改内容:

        gradients = self.target_gradients
        activations = self.target_activations
        if self.is_recognizer2d:
            # [B*Tg, C', H', W']
            b_tg, c, _, _ = gradients.size()
            tg = b_tg // b
        else:
            grad = gradients.size()
            # implement for transformer
            if len(grad) == 3:
                _, tg, c = grad

                ########### You can edit feature_h and feature_w to support your transformer
                feature_h = int(14)
                feature_w = int(14)
                ###########

                tg /= (feature_h * feature_w)
                tg = int(tg)
                gradients = gradients[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                gradients = gradients.permute(0, 1, 4, 2, 3)
                activations = activations[:,1:,:].reshape(-1, tg, feature_h, feature_w, c)
                activations = activations.permute(0, 1, 4, 2, 3)
            elif len(grad) == 5:
                _, c, tg, _, _ = grad
                # target shape: [B, Tg, C', H', W']
                gradients = gradients.permute(0, 2, 1, 3, 4)
                activations = activations.permute(0, 2, 1, 3, 4)
            else:
                raise NotImplementedError("Please check grad shape")

Does this mean the code is already working?

Yes,But the results are meaningless. They show no signs of special region of the input video.

EaaloZ avatar Feb 29 '24 08:02 EaaloZ