TALLFormer ChunkVideoSwin has no gradients

Hi, I have a question related to the training of Tallformer. In particular I noticed that when training the backbone, ChunkVideoSwin has gradients set to None (I think this may lead to problems during backward computation). Is it a normal behaviour or is there something wrong?

To test I've inserted here https://github.com/klauscc/TALLFormer/blob/5519140e39095cd87d9b50420bde912975cae9fb/vedatad/models/detectors/mem_single_stage_detector.py#L67

this line of code:

for name, param in self.backbone.named_parameters():
            print("name: ", name, "grad: ", param.grad)

Apr 21 '23 14:04 SimoLoca

Hi SimoLoca, the gradients should be None during forward. You can only get gradients after loss.backward() and before optimizer.zero_grad(). If you want to see the gradients, you can insert your code right after L23: https://github.com/klauscc/TALLFormer/blob/5519140e39095cd87d9b50420bde912975cae9fb/vedacore/hooks/optimizer.py#L23

Apr 21 '23 17:04 klauscc

Hi @klauscc, thanks for the fast reply. I tried as you have mentioned, precisely after L23 I've inserted:

for name, param in looper.train_engine.model.backbone.named_parameters():
    print("name: ", name, "grad: ", param.grad)
for name, param in looper.train_engine.model.neck.named_parameters():
    print("name: ", name, "grad: ", param.grad)
for name, param in looper.train_engine.model.head.named_parameters():
    print("name: ", name, "grad: ", param.grad)

And interestingly, the backbone has all gradients set to None, while neck and head has gradients. Is this behavior, therefore, correct? And lastly, will this mean that during training the backbone is freezed, and if so how to "unfreeze" it? Thanks so much!

Apr 22 '23 15:04 SimoLoca

Hi @SimoLoca, I did a quick check and the backbone is indeed updated during training:

>>> import torch
>>> s1 = torch.load('epoch_600_weights.pth',map_location="cpu")
>>> s2 = torch.load('epoch_1000_weights.pth',map_location="cpu")
>>> w1 = s1['backbone.layers.2.blocks.16.mlp.fc2.bias']
>>> w2 = s2['backbone.layers.2.blocks.16.mlp.fc2.bias']
>>> torch.allclose(w1,w2)
False
>>> w1[:10]
tensor([ 0.0496,  0.0174,  0.0173, -0.1023,  0.0316,  0.8908, -0.1456, -0.1831,
        -0.3061, -0.3634])
>>> w2[:10]
tensor([ 0.0492,  0.0165,  0.0165, -0.1018,  0.0315,  0.8822, -0.1449, -0.1810,
        -0.3043, -0.3599])
>>>

In the config file: https://github.com/klauscc/TALLFormer/blob/main/configs/trainval/thumos/1.0.0-vswin_b_256x256-12GB.py#L99 the first 2 stages of the backbone is frozen. In Swin-B there are 24 layers, we only tune the last 20 layers (the last two stages). Did you only check the first several parameters?

Apr 23 '23 18:04 klauscc

I check that following the readme for training the model, without loading checkpoint. And also I did not change the config file. Did I get it wrong the way to check if the backbone's weights are updated?

Apr 23 '23 19:04 SimoLoca

Hi @klauscc, I've resolved the issue. There are no problem with the code, there were some errors on my config file, so forgive me if i disturbed you too much. Just one last question: during the feature extraction phase, might it make sense to use a stride? For example, with a stride of 16, processing frames 0 - 32, then 16 - 48, and so on?

Thanks you so much

Apr 27 '23 15:04 SimoLoca

It's great you figure it out! Yes I believe extracting features with a stride may lead to higher performance. But in this way your computational cost will increase; and you need to make some changes to the backbone code to process frames the same way.

May 05 '23 21:05 klauscc

Ok, thanks you. So I need to make some changes in SwinTransformer3D or in ChunkVideoSwin?

May 08 '23 08:05 SimoLoca

TALLFormer TALLFormer copied to clipboard

ChunkVideoSwin has no gradients

TALLFormer
TALLFormer copied to clipboard