captum icon indicating copy to clipboard operation
captum copied to clipboard

how to use captum for sentence classification

Open Hossein-1991 opened this issue 2 years ago • 11 comments

Hello, I'm new to Captum and have a couple of questions about it (I couldn't find relevant tutorials). I'm going to design a bert network with a classifier on top of that. Something like this:

from transformers import BertModel

class Bert(nn.Module):
  def __init__(self):
    super(Bert,self).__init__()
    self.bert = BertModel.from_pretrained('bert-base-uncased')
    self.drop = nn.Dropout(0.5)
    self.fc = nn.Linear(768,5) 
  def forward(self,ids,masks):
    _ , cls = self.bert(input_ids = ids, attention_mask = masks, return_dict = False) 
    y = self.drop(cls)
    y = self.fc(y)
    return y

model = Bert()

As it is clear, we need to feed ids and masks of a given text. For my example, suppose we have a text and tokenize it using bert toeknizer:

from transformers import BertTokenizer 
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

We can generate an example by tokenizer:

A = ['this is an example','this is not an example']
tokens = tokenizer(A, max_length = 512, padding = 'max_length', truncation = True, return_tensors = 'pt')
dataset = TensorDataset(tokens.input_ids, tokens.attention_mask)
data_loader = DataLoader(dataset, batch_size = 1)

Now, it is time to set Captum! This is how I am trying to run that (in which i and j is read through data loader):

def predict(x,y):
  y = model(ids = x.long(), masks = y.long())
  return F.softmax(y, dim=-1)

integrated_gradients = IntegratedGradients(predict)
attributions_ig = integrated_gradients.attribute((i,j), target=0)

Now my questions are:

  1. Am I setting the Captum correctly? Is everything right?!
  2. When I implement these codes, it gives rise to this error: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. How should I fix that?

Many thanks in advance.

Hossein-1991 avatar Jan 31 '23 20:01 Hossein-1991

@NarineK Hi, sorry for bothering you. Apparently you are the one who has enough information :) Could you please help me with this?

Hossein-1991 avatar Feb 02 '23 14:02 Hossein-1991

@Hossein-1991 can you paste the error stack trace for me to confirm where the error is thrown?

You are using IntegratedGradients but your input tokens, word ids, are not used in differentiable operation (e.g., embedding mapping). And moreover, long tensors won't have gradients https://discuss.pytorch.org/t/tensor-long-lose-requires-grad/135218

Consider using other algorithms or check this tutorial for using gradients for text https://captum.ai/tutorials/IMDB_TorchText_Interpret

aobo-y avatar Feb 03 '23 20:02 aobo-y

@Hossein-1991 can you paste the error stack trace for me to confirm where the error is thrown?

You are using IntegratedGradients but your input tokens, word ids, are not used in differentiable operation (e.g., embedding mapping). And moreover, long tensors won't have gradients https://discuss.pytorch.org/t/tensor-long-lose-requires-grad/135218

Consider using other algorithms or check this tutorial for using gradients for text https://captum.ai/tutorials/IMDB_TorchText_Interpret

Thank you for your reply. Actually, when I don't convert tensors into long. it gives rise to an error again (I do not remember the exact words, but the error says that captum needs int inputs instead of float type). I think (as you recommended) I must choose LayerIntegratedGradients. But still it remains a bit confusing for me. Because even the example you refered to doesn't use Bert-base networks. Till now, I couldn't find any example of using Bert models and Captum together (for sentence classification task)

Hossein-1991 avatar Feb 04 '23 11:02 Hossein-1991

I'm actually facing the same problem, @Hossein-1991 could you show me how you managed to convert your tensors into long. I also want to know whether or not you could implement LayerIntegratedGradients. Thanks!

abderrahmane-mhd avatar Feb 06 '23 10:02 abderrahmane-mhd

I'm actually facing the same problem, @Hossein-1991 could you show me how you managed to convert your tensors into long. I also want to know whether or not you could implement LayerIntegratedGradients. Thanks!

I just added a .long() phrase to the end of each tensor. Hmm, actually no!... It has several ambiguities and unfortunately, tutorials are not enough. Although there is another library named transformer interpret (follow this link) that is based upon Captum and is easier to work with. But still, I couldn't customize that!

Hossein-1991 avatar Feb 06 '23 12:02 Hossein-1991

@Hossein-1991 First of all, based on your code, you just converted your tensors to long for your model, not Captum: y = model(ids = x.long(), masks = y.long()) So if you receive any errors for data types, that's because of your model requires it, not Captum. What Captum received is still the original i and j: integrated_gradients.attribute((i,j), target=0)

Again, if you believe Captum caused any data type bugs, please attach the error log for us.

Second, have you went through our tutorials? we do have tutorials about Bert, e.g., https://github.com/pytorch/captum/blob/master/tutorials/Bert_SQUAD_Interpret.ipynb

But anyway, Bert or not, the model architecture is not that important as it is treated as a black-box. If you still have any issues, please feel free to post more details, like what does not work for you exactly.

aobo-y avatar Feb 07 '23 02:02 aobo-y

Here is a minimum example with captum and sentence classification

Hope this help :

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "citizenlab/twitter-xlm-roberta-base-sentiment-finetunned"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
model.zero_grad()
tokenizer = AutoTokenizer.from_pretrained(model_name)

def predict(input_ids):
    outputs = model(input_ids)
    return outputs.logits.max(1).values

ref_token_id = tokenizer.pad_token_id

def construct_input_ref_pair(text, ref_token_id):
    input_ids = tokenizer.encode(text, return_tensors='pt')
    # construct reference token ids 
    ref_input_ids = torch.zeros_like(input_ids)
    ref_input_ids[:] = ref_token_id
    return input_ids, ref_input_ids

text = "#AppolinedeMalherbe vient de nous démontrer qu'elle n'avait Rien d'une journaliste objective. Aggressive, prétentieuse, elle n'écoute pas et veut faire SA politique ! Merci @GDarmanin de ne pas être rentré ds son jeu de provocation ! @BFMTV @CNEWS"

input_ids, ref_input_ids = construct_input_ref_pair(text, ref_token_id)
from captum.attr import LayerIntegratedGradients
lig = LayerIntegratedGradients(predict, model.roberta.embeddings)
attributions_start, delta_start = lig.attribute(inputs=input_ids,
                                  baselines=ref_input_ids,
                                  return_convergence_delta=True)

paulgay avatar Feb 07 '23 22:02 paulgay

Here is a minimum example with captum and sentence classification

Hope this help :

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "citizenlab/twitter-xlm-roberta-base-sentiment-finetunned"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
model.zero_grad()
tokenizer = AutoTokenizer.from_pretrained(model_name)

def predict(input_ids):
    outputs = model(input_ids)
    return outputs.logits.max(1).values

ref_token_id = tokenizer.pad_token_id

def construct_input_ref_pair(text, ref_token_id):
    input_ids = tokenizer.encode(text, return_tensors='pt')
    # construct reference token ids 
    ref_input_ids = torch.zeros_like(input_ids)
    ref_input_ids[:] = ref_token_id
    return input_ids, ref_input_ids

text = "#AppolinedeMalherbe vient de nous démontrer qu'elle n'avait Rien d'une journaliste objective. Aggressive, prétentieuse, elle n'écoute pas et veut faire SA politique ! Merci @GDarmanin de ne pas être rentré ds son jeu de provocation ! @BFMTV @CNEWS"

input_ids, ref_input_ids = construct_input_ref_pair(text, ref_token_id)
from captum.attr import LayerIntegratedGradients
lig = LayerIntegratedGradients(predict, model.roberta.embeddings)
attributions_start, delta_start = lig.attribute(inputs=input_ids,
                                  baselines=ref_input_ids,
                                  return_convergence_delta=True)

Thank you for your wonderful example (though I think you forgot to add target to the last line). Now, I have two questions:

  1. Aren't mask tokens necessary for the computations?
  2. How can I visualize the results? Thanks

Hossein-1991 avatar Feb 09 '23 10:02 Hossein-1991

@Hossein-1991 First of all, based on your code, you just converted your tensors to long for your model, not Captum: y = model(ids = x.long(), masks = y.long()) So if you receive any errors for data types, that's because of your model requires it, not Captum. What Captum received is still the original i and j: integrated_gradients.attribute((i,j), target=0)

Again, if you believe Captum caused any data type bugs, please attach the error log for us.

Second, have you went through our tutorials? we do have tutorials about Bert, e.g., https://github.com/pytorch/captum/blob/master/tutorials/Bert_SQUAD_Interpret.ipynb

But anyway, Bert or not, the model architecture is not that important as it is treated as a black-box. If you still have any issues, please feel free to post more details, like what does not work for you exactly.

When I just implement my model with i and j, it gives no errors. For example the output of the below code:

model(ids = i, masks = j)

is: tensor([[-0.2446, 0.5744, -0.4547, 0.1483, 0.5559]], grad_fn=<AddmmBackward0>)

But when I implement it through predict function:

def predict(x,y):
  y = model(ids = x, masks = y)
  return F.softmax(y, dim=-1)

integrated_gradients = IntegratedGradients(predict)
attributions_ig = integrated_gradients.attribute((i,j), target=0)

it gives this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-11-871a0069ab99>](https://localhost:8080/#) in <module>
      4 
      5 integrated_gradients = IntegratedGradients(predict)
----> 6 attributions_ig = integrated_gradients.attribute((i,j), target=0)

14 frames
[/usr/local/lib/python3.8/dist-packages/captum/log/__init__.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
     40             @wraps(func)
     41             def wrapper(*args, **kwargs):
---> 42                 return func(*args, **kwargs)
     43 
     44             return wrapper

[/usr/local/lib/python3.8/dist-packages/captum/attr/_core/integrated_gradients.py](https://localhost:8080/#) in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
    284             )
    285         else:
--> 286             attributions = self._attribute(
    287                 inputs=inputs,
    288                 baselines=baselines,

[/usr/local/lib/python3.8/dist-packages/captum/attr/_core/integrated_gradients.py](https://localhost:8080/#) in _attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, step_sizes_and_alphas)
    349 
    350         # grads: dim -> (bsz * #steps x inputs[0].shape[1:], ...)
--> 351         grads = self.gradient_func(
    352             forward_fn=self.forward_func,
    353             inputs=scaled_features_tpl,

[/usr/local/lib/python3.8/dist-packages/captum/_utils/gradient.py](https://localhost:8080/#) in compute_gradients(forward_fn, inputs, target_ind, additional_forward_args)
    110     with torch.autograd.set_grad_enabled(True):
    111         # runs forward pass
--> 112         outputs = _run_forward(forward_fn, inputs, target_ind, additional_forward_args)
    113         assert outputs[0].numel() == 1, (
    114             "Target not provided when necessary, cannot"

[/usr/local/lib/python3.8/dist-packages/captum/_utils/common.py](https://localhost:8080/#) in _run_forward(forward_func, inputs, target, additional_forward_args)
    480     additional_forward_args = _format_additional_forward_args(additional_forward_args)
    481 
--> 482     output = forward_func(
    483         *(*inputs, *additional_forward_args)
    484         if additional_forward_args is not None

[<ipython-input-11-871a0069ab99>](https://localhost:8080/#) in predict(x, y)
      1 def predict(x,y):
----> 2   y = model(ids = x, masks = y)
      3   return F.softmax(y, dim=-1)
      4 
      5 integrated_gradients = IntegratedGradients(predict)

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

[<ipython-input-4-d39762b0fb5a>](https://localhost:8080/#) in forward(self, ids, masks)
      8     self.fc = nn.Linear(768,5)
      9   def forward(self,ids,masks):
---> 10     _ , cls = self.bert(input_ids = ids, attention_mask = masks, return_dict = False)
     11     y = self.drop(cls)
     12     y = self.fc(y)

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
   1010         head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
   1011 
-> 1012         embedding_output = self.embeddings(
   1013             input_ids=input_ids,
   1014             position_ids=position_ids,

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py](https://localhost:8080/#) in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    228 
    229         if inputs_embeds is None:
--> 230             inputs_embeds = self.word_embeddings(input_ids)
    231         token_type_embeddings = self.token_type_embeddings(token_type_ids)
    232 

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/sparse.py](https://localhost:8080/#) in forward(self, input)
    158 
    159     def forward(self, input: Tensor) -> Tensor:
--> 160         return F.embedding(
    161             input, self.weight, self.padding_idx, self.max_norm,
    162             self.norm_type, self.scale_grad_by_freq, self.sparse)

[/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2208         # remove once script supports set_grad_enabled
   2209         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2211 
   2212 

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)

Hossein-1991 avatar Feb 09 '23 11:02 Hossein-1991

@Hossein-1991 Please pay attention to this part of your log

[/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
   1010         head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
   1011 
-> 1012         embedding_output = self.embeddings(
   1013             input_ids=input_ids,
   1014             position_ids=position_ids,

The error is given by your Bert model /transformers/models/bert/modeling_bert.py. The model requires Long and Int which makes sense, as it expects you to pass the word token IDs. This is not a requirement of Captum.

If your model works with model(ids = i, masks = j), that's because your i is Long. Can you go to double check the data types of i?

I would suggest you to carefully go through the documentation of the Bert model you use to understand what go into the model if you haven't already https://huggingface.co/docs/transformers/v4.26.1/en/model_doc/bert#transformers.BertModel.forward . It explains the inputs and their types in details. input_ids should be LongTensor Screenshot 2023-02-10 at 4 26 22 PM

As I explained at beginning,

long tensors won't have gradients

so you cannot use Captum IntegratedGradients. Please check other methods like LayerIntegratedGradients.

aobo-y avatar Feb 11 '23 00:02 aobo-y

@aobo-y I have a long text (every text has about 2000 tokens) and want to extract key words (and phrases). Which method do you suggest?

Hossein-1991 avatar Feb 11 '23 11:02 Hossein-1991