fairseq checkpoint_wrapper at the first block disabling gradients

checkpoint_wrapper at the first block disabling gradients

Open YingChenlu opened this issue 3 years ago • 1 comments

🐛 Bug

If I use checkpoint_wrapper at the first block, the gradients of their parameters are None after backward.

Code sample

import os
import sys
import torch
import torch.nn as nn

from fairseq.modules.checkpoint_activations import checkpoint_wrapper

conv = checkpoint_wrapper(nn.Sequential(
    nn.Conv1d(1, 1, 1),
    nn.Conv1d(1, 1, 1)
))
fc = nn.Linear(1, 1)

x = torch.randn(2, 1, 1)

conv.train()
fc.train()

conv.zero_grad()
fc.zero_grad()

print(f'before forward, conv[0].weight.grad is {conv[0].weight.grad}')
print(f'before foward, fc.weight.grad is {fc.weight.grad}')

y = conv(x)
z = fc(y)
loss = z.mean()
loss.backward()

print(f'after forward, conv[0].weight.grad is {conv[0].weight.grad}')
print(f'after foward, fc.weight.grad is {fc.weight.grad}')

saving code above to bug.py, run python bug.py1
stdout shows

before forward, conv[0].weight.grad is None
before foward, fc.weight.grad is None
after forward, conv[0].weight.grad is None
after foward, fc.weight.grad is tensor([[-0.1495]])

fc.weight.grad may be different, but is not None.

Expected behavior

Gradients of conv block's parameters are not None.

Environment

fairseq Version (e.g., 1.0 or master): 1.0.0a0+366974d
PyTorch Version (e.g., 1.0): 1.7.1+cu92
OS (e.g., Linux): Linux ubuntu 4.15.0-76-generic
How you installed fairseq (pip, source): git clone the repo and then pip install inside the repo
Build command you used (if compiling from source): None
Python version: 3.8.5
CUDA/cuDNN version: Not used
GPU models and configuration: Not used

Although it can be avoided by setting the input.requires_grad=True or not using checkpoint_wrapper at the first block, I wonder how it happens.

May 31 '21 08:05 YingChenlu

@YingChenlu I have a similar question, do you know how to freeze parameters of the model in fairseq when training. I used both zero_grad and requires_grad = false are not working well…

Sep 08 '22 14:09 robotsp

fairseq fairseq copied to clipboard

checkpoint_wrapper at the first block disabling gradients

🐛 Bug

Code sample

Expected behavior

Environment

fairseq
fairseq copied to clipboard