functorch icon indicating copy to clipboard operation
functorch copied to clipboard

No differentiated Tensors in the graph when using autograd.grad with functorch

Open malessandro opened this issue 3 years ago • 1 comments

Hi all,

I have some problem when computing the vjp (with torch.autograd.grad) between the jacobian of a neural network w.r.t. its parameters and a random tensor. The neural network evaluates hessian w.r.t. inputs (with functorch) in the forward step.

This is the error I got: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

And this is a minimal example to reproduce the error:

import torch
import torch.nn as nn
from functorch import vmap, hessian

class Net(nn.Module):
    def __init__(self, m):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(m,100),
                                 nn.SiLU(),
                                 nn.Linear(100,100),
                                 nn.SiLU(),
                                 nn.Linear(100,1)) 
         
    def forward(self, x):
        H = vmap(hessian(self.net), in_dims=0)(x)
        trace = vmap(torch.trace, in_dims=0)(H.squeeze(1))
        return trace.unsqueeze(1)

net = Net(m = 2).cuda()
y = net(torch.rand(10,2).cuda())
v = torch.rand(10,1).cuda()            
d_param = torch.autograd.grad(y, list(net.parameters()),v)

I don't have errors when I try to compute gradients of a loss w.r.t the neural network parameters using .backward().

Where am I going wrong?

malessandro avatar Aug 18 '22 15:08 malessandro

The error: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. means that the derivative with respect to one of the parameters is 0, not necessarily that the tensor is not used in the computation but that where it was used was not tracked by autograd (best guess: add gets seen as a constant after the hessian computation, so it doesn't have a backwards associated with it). This is also checks out because it works with backward, which will allow for tensors with zero gradients (and from testing locally, the grad for that element is None)

To fix this there's a couple of options: (1) torch.autograd.grad(y, list(net.parameters()),v, allow_unused=True) if you do this you'll get None for the final gradient, corresponding to the bias of the final linear. You can treat that as 0 in subsequent computations (2) torch.autograd.grad(y, list(net.parameters())[:-1],v) if you do this, you won't get a gradient with respect to the final bias and should be careful to respect that. This might also be less desirable since if you change this example, the gradient with respect to bias may be non-zero and you would miss that

samdow avatar Aug 19 '22 16:08 samdow