Tune-A-Video
Tune-A-Video copied to clipboard
unet.down_blocks does not seem to be updating
I ran the training code with two GPUs and got error message Parameters which did not receive grad for rank 0 down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_q.weight, xxx
.
I double checked and found that down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_v.weight
is always zero.
This issue may be resulted from torch.utils.checkpoint.checkpoint
in https://github.com/showlab/Tune-A-Video/blob/main/tuneavideo/models/unet_blocks.py#L300.
Reference: https://github.com/huggingface/transformers/issues/21381 "gradient checkpointing disables requires_grad when freezing part of models (fix with use_reentrant=False)"
Can someone confirm if this issue exists and provide a brief update?
can you check if this issue still occurs when training on one gpu?
can you check if this issue still occurs when training on one gpu?
It also occurs when training on one gpu with no warning or error. The trainable modules in down_blocks stills have no grad.
Quick checking:
You can find unet.down_blocks[2].attentions[1].transformer_blocks[0].attn_temp.to_out[0].weight
are always zeros and unet.down_blocks[2].attentions[1].transformer_blocks[0].attn_temp.to_out[0].weight.grad
is alwasy None
I'm facing the same issue: the network seems not being updated when trainning. The gradients of the trainable modules are always zero. Can anyone resolve the problem?
It is because you enable 'torch.utils.checkpoint' to save GPU memory. If you want to update learnable modules in checkpoint function, you must ensure your input tensor.requires_grad = True since Tune-A-Video uses the default parameter 'use_reentrant=True' in torch.utils.checkpoint function. You know why mid layers and up layers can be updated ? Since mid layers do not use torch.utils.checkpoint function, so trainable parameters make hidden_states tensor requires_grad=True, then up layers can be updated through use torch.utils.checkpoint function.
The simplest way to solve this is to set 'gradient_checkpointing: False', if your GPU has a sufficient memory🙃.
Good luck!