captum Unnecessary gradient computing in LayerDeepLift

Unnecessary gradient computing in LayerDeepLift

Open znacer opened this issue 1 year ago • 0 comments

🐛 Bug

When using LayerDeepLift on a model, gradient computation is required for layers that are before the layer on which I am computing attributions.

To Reproduce

Steps to reproduce the behavior:

My model:

import torch
import torch.nn as nn
from transformers import AutoModel
from collections import OrderedDict

class MMModel(nn.Module):
    def __init__(self, 
                 model_name, 
                 model_dir, dropout, 
                 n_fc, 
                 n_classes, 
                 hidden_size=None):
        super().__init__()
        self.transformer = AutoModel.from_pretrained("robeta-base")
        for param in self.transformer.parameters():
            param.requires_grad = False
        self.drop = nn.Dropout(dropout)
        self.classifier = torch.nn.Sequential(OrderedDict([
                    ("hidden_layer", nn.Linear(n_fc, hidden_size)),
                    ("relu", nn.ReLU()),
                    ("drop", nn.Dropout(dropout)),
                    ("output_layer", nn.Linear(hidden_size, n_classes))
                ]))

    def forward(self, ids, mask):
        hidden_output, pooled_output = self.transformer(
            ids,
            attention_mask=mask,
            return_dict=False,
        )

        out = self.drop(pooled_output)
        out = self.classifier(out)
        return out

Data example

from transformers import RobertaTokenizer
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")

inputs = (
        tokenizer(["any text here"], return_tensors='pt', max_length=512, 
            padding='max_length', truncation=True)['input_ids'].to(device),
        tokenizer(["any text here"], return_tensors='pt', max_length=512, 
            padding='max_length', truncation=True)['attention_mask'].to(device)
    )
  baselines = (
        tokenizer('' ,return_tensors='pt', max_length=512, padding='max_length', truncation=True)['input_ids'].to(device),
        tokenizer('' ,return_tensors='pt', max_length=512, padding='max_length', truncation=True)['attention_mask'].to(device)
            )

Explain the model

from captum.attr import LayerDeepLift

ldl = LayerDeepLift(model, model.classifier)
attr = ldl.attribute(inputs=inputs,
                baselines=baselines,
                attribute_to_layer_input=True,
            )

Error Message: RuntimeError: cannot register a hook on a tensor that doesn't require gradient

Expected behavior

As LayerDeepLift is use to compute attributions on a Layer, user could expect that gradient computing is not needed from earlier layers. As DeepLift uses a backpropagation method to compute multipliers, I would expect that only multipliers between output and studied layer are computed.

Environment

Describe the environment used for Captum


 - Captum / PyTorch Version (e.g., 1.0 / 0.4.0): 1.12.1+cu102/0.6.0
 - OS (e.g., Linux): Ubuntu 18.04.4 LTS
 - How you installed Captum / PyTorch (`conda`, `pip`, source): pip
 - Build command you used (if compiling from source):
 - Python version: 3.10.11
 - CUDA/cuDNN version: CUDA 10.2 
 - GPU models and configuration: Nvidia Quadro P6000
 - Any other relevant information:

Additional context

Add gradient computing before computing attributions works:

for param in model.parameters():
    param.requires_grad = True

If these gradient are mandatory, this might be implemented when creating the LayerDeepLift object.

Aug 28 '23 09:08 znacer

captum captum copied to clipboard

Unnecessary gradient computing in LayerDeepLift

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

captum
captum copied to clipboard