captum
captum copied to clipboard
Unnecessary gradient computing in LayerDeepLift
🐛 Bug
When using LayerDeepLift on a model, gradient computation is required for layers that are before the layer on which I am computing attributions.
To Reproduce
Steps to reproduce the behavior:
- My model:
import torch
import torch.nn as nn
from transformers import AutoModel
from collections import OrderedDict
class MMModel(nn.Module):
def __init__(self,
model_name,
model_dir, dropout,
n_fc,
n_classes,
hidden_size=None):
super().__init__()
self.transformer = AutoModel.from_pretrained("robeta-base")
for param in self.transformer.parameters():
param.requires_grad = False
self.drop = nn.Dropout(dropout)
self.classifier = torch.nn.Sequential(OrderedDict([
("hidden_layer", nn.Linear(n_fc, hidden_size)),
("relu", nn.ReLU()),
("drop", nn.Dropout(dropout)),
("output_layer", nn.Linear(hidden_size, n_classes))
]))
def forward(self, ids, mask):
hidden_output, pooled_output = self.transformer(
ids,
attention_mask=mask,
return_dict=False,
)
out = self.drop(pooled_output)
out = self.classifier(out)
return out
- Data example
from transformers import RobertaTokenizer
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
inputs = (
tokenizer(["any text here"], return_tensors='pt', max_length=512,
padding='max_length', truncation=True)['input_ids'].to(device),
tokenizer(["any text here"], return_tensors='pt', max_length=512,
padding='max_length', truncation=True)['attention_mask'].to(device)
)
baselines = (
tokenizer('' ,return_tensors='pt', max_length=512, padding='max_length', truncation=True)['input_ids'].to(device),
tokenizer('' ,return_tensors='pt', max_length=512, padding='max_length', truncation=True)['attention_mask'].to(device)
)
- Explain the model
from captum.attr import LayerDeepLift
ldl = LayerDeepLift(model, model.classifier)
attr = ldl.attribute(inputs=inputs,
baselines=baselines,
attribute_to_layer_input=True,
)
- Error Message:
RuntimeError: cannot register a hook on a tensor that doesn't require gradient
Expected behavior
As LayerDeepLift is use to compute attributions on a Layer, user could expect that gradient computing is not needed from earlier layers. As DeepLift uses a backpropagation method to compute multipliers, I would expect that only multipliers between output and studied layer are computed.
Environment
Describe the environment used for Captum
- Captum / PyTorch Version (e.g., 1.0 / 0.4.0): 1.12.1+cu102/0.6.0
- OS (e.g., Linux): Ubuntu 18.04.4 LTS
- How you installed Captum / PyTorch (`conda`, `pip`, source): pip
- Build command you used (if compiling from source):
- Python version: 3.10.11
- CUDA/cuDNN version: CUDA 10.2
- GPU models and configuration: Nvidia Quadro P6000
- Any other relevant information:
Additional context
Add gradient computing before computing attributions works:
for param in model.parameters():
param.requires_grad = True
If these gradient are mandatory, this might be implemented when creating the LayerDeepLift object.