captum
captum copied to clipboard
how to use captum for sentence classification
❓
Hello, I'm new to Captum and have a couple of questions about it (I couldn't find relevant tutorials). I'm going to design a bert network with a classifier on top of that. Something like this:
from transformers import BertModel
class Bert(nn.Module):
def __init__(self):
super(Bert,self).__init__()
self.bert = BertModel.from_pretrained('bert-base-uncased')
self.drop = nn.Dropout(0.5)
self.fc = nn.Linear(768,5)
def forward(self,ids,masks):
_ , cls = self.bert(input_ids = ids, attention_mask = masks, return_dict = False)
y = self.drop(cls)
y = self.fc(y)
return y
model = Bert()
As it is clear, we need to feed ids
and masks
of a given text. For my example, suppose we have a text and tokenize it using bert toeknizer:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
We can generate an example by tokenizer:
A = ['this is an example','this is not an example']
tokens = tokenizer(A, max_length = 512, padding = 'max_length', truncation = True, return_tensors = 'pt')
dataset = TensorDataset(tokens.input_ids, tokens.attention_mask)
data_loader = DataLoader(dataset, batch_size = 1)
Now, it is time to set Captum! This is how I am trying to run that (in which i
and j
is read through data loader):
def predict(x,y):
y = model(ids = x.long(), masks = y.long())
return F.softmax(y, dim=-1)
integrated_gradients = IntegratedGradients(predict)
attributions_ig = integrated_gradients.attribute((i,j), target=0)
Now my questions are:
- Am I setting the Captum correctly? Is everything right?!
- When I implement these codes, it gives rise to this error:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
How should I fix that?
Many thanks in advance.
@NarineK Hi, sorry for bothering you. Apparently you are the one who has enough information :) Could you please help me with this?
@Hossein-1991 can you paste the error stack trace for me to confirm where the error is thrown?
You are using IntegratedGradients
but your input tokens
, word ids, are not used in differentiable operation (e.g., embedding mapping). And moreover, long tensors won't have gradients https://discuss.pytorch.org/t/tensor-long-lose-requires-grad/135218
Consider using other algorithms or check this tutorial for using gradients for text https://captum.ai/tutorials/IMDB_TorchText_Interpret
@Hossein-1991 can you paste the error stack trace for me to confirm where the error is thrown?
You are using
IntegratedGradients
but your inputtokens
, word ids, are not used in differentiable operation (e.g., embedding mapping). And moreover, long tensors won't have gradients https://discuss.pytorch.org/t/tensor-long-lose-requires-grad/135218Consider using other algorithms or check this tutorial for using gradients for text https://captum.ai/tutorials/IMDB_TorchText_Interpret
Thank you for your reply.
Actually, when I don't convert tensors into long
. it gives rise to an error again (I do not remember the exact words, but the error says that captum needs int inputs instead of float type).
I think (as you recommended) I must choose LayerIntegratedGradients
. But still it remains a bit confusing for me. Because even the example you refered to doesn't use Bert-base networks.
Till now, I couldn't find any example of using Bert models and Captum together (for sentence classification task)
I'm actually facing the same problem, @Hossein-1991 could you show me how you managed to convert your tensors into long
.
I also want to know whether or not you could implement LayerIntegratedGradients
. Thanks!
I'm actually facing the same problem, @Hossein-1991 could you show me how you managed to convert your tensors into
long
. I also want to know whether or not you could implementLayerIntegratedGradients
. Thanks!
I just added a .long()
phrase to the end of each tensor.
Hmm, actually no!... It has several ambiguities and unfortunately, tutorials are not enough. Although there is another library named transformer interpret
(follow this link) that is based upon Captum and is easier to work with. But still, I couldn't customize that!
@Hossein-1991
First of all, based on your code, you just converted your tensors to long
for your model, not Captum:
y = model(ids = x.long(), masks = y.long())
So if you receive any errors for data types, that's because of your model requires it, not Captum. What Captum received is still the original i
and j
: integrated_gradients.attribute((i,j), target=0)
Again, if you believe Captum caused any data type bugs, please attach the error log for us.
Second, have you went through our tutorials? we do have tutorials about Bert, e.g., https://github.com/pytorch/captum/blob/master/tutorials/Bert_SQUAD_Interpret.ipynb
But anyway, Bert or not, the model architecture is not that important as it is treated as a black-box. If you still have any issues, please feel free to post more details, like what does not work for you exactly.
Here is a minimum example with captum and sentence classification
Hope this help :
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "citizenlab/twitter-xlm-roberta-base-sentiment-finetunned"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
model.zero_grad()
tokenizer = AutoTokenizer.from_pretrained(model_name)
def predict(input_ids):
outputs = model(input_ids)
return outputs.logits.max(1).values
ref_token_id = tokenizer.pad_token_id
def construct_input_ref_pair(text, ref_token_id):
input_ids = tokenizer.encode(text, return_tensors='pt')
# construct reference token ids
ref_input_ids = torch.zeros_like(input_ids)
ref_input_ids[:] = ref_token_id
return input_ids, ref_input_ids
text = "#AppolinedeMalherbe vient de nous démontrer qu'elle n'avait Rien d'une journaliste objective. Aggressive, prétentieuse, elle n'écoute pas et veut faire SA politique ! Merci @GDarmanin de ne pas être rentré ds son jeu de provocation ! @BFMTV @CNEWS"
input_ids, ref_input_ids = construct_input_ref_pair(text, ref_token_id)
from captum.attr import LayerIntegratedGradients
lig = LayerIntegratedGradients(predict, model.roberta.embeddings)
attributions_start, delta_start = lig.attribute(inputs=input_ids,
baselines=ref_input_ids,
return_convergence_delta=True)
Here is a minimum example with captum and sentence classification
Hope this help :
import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer model_name = "citizenlab/twitter-xlm-roberta-base-sentiment-finetunned" model = AutoModelForSequenceClassification.from_pretrained(model_name) model.eval() model.zero_grad() tokenizer = AutoTokenizer.from_pretrained(model_name) def predict(input_ids): outputs = model(input_ids) return outputs.logits.max(1).values ref_token_id = tokenizer.pad_token_id def construct_input_ref_pair(text, ref_token_id): input_ids = tokenizer.encode(text, return_tensors='pt') # construct reference token ids ref_input_ids = torch.zeros_like(input_ids) ref_input_ids[:] = ref_token_id return input_ids, ref_input_ids text = "#AppolinedeMalherbe vient de nous démontrer qu'elle n'avait Rien d'une journaliste objective. Aggressive, prétentieuse, elle n'écoute pas et veut faire SA politique ! Merci @GDarmanin de ne pas être rentré ds son jeu de provocation ! @BFMTV @CNEWS" input_ids, ref_input_ids = construct_input_ref_pair(text, ref_token_id) from captum.attr import LayerIntegratedGradients lig = LayerIntegratedGradients(predict, model.roberta.embeddings) attributions_start, delta_start = lig.attribute(inputs=input_ids, baselines=ref_input_ids, return_convergence_delta=True)
Thank you for your wonderful example (though I think you forgot to add target
to the last line).
Now, I have two questions:
- Aren't mask tokens necessary for the computations?
- How can I visualize the results? Thanks
@Hossein-1991 First of all, based on your code, you just converted your tensors to
long
for your model, not Captum:y = model(ids = x.long(), masks = y.long())
So if you receive any errors for data types, that's because of your model requires it, not Captum. What Captum received is still the originali
andj
:integrated_gradients.attribute((i,j), target=0)
Again, if you believe Captum caused any data type bugs, please attach the error log for us.
Second, have you went through our tutorials? we do have tutorials about Bert, e.g., https://github.com/pytorch/captum/blob/master/tutorials/Bert_SQUAD_Interpret.ipynb
But anyway, Bert or not, the model architecture is not that important as it is treated as a black-box. If you still have any issues, please feel free to post more details, like what does not work for you exactly.
When I just implement my model with i
and j
, it gives no errors. For example the output of the below code:
model(ids = i, masks = j)
is:
tensor([[-0.2446, 0.5744, -0.4547, 0.1483, 0.5559]], grad_fn=<AddmmBackward0>)
But when I implement it through predict function:
def predict(x,y):
y = model(ids = x, masks = y)
return F.softmax(y, dim=-1)
integrated_gradients = IntegratedGradients(predict)
attributions_ig = integrated_gradients.attribute((i,j), target=0)
it gives this error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-11-871a0069ab99>](https://localhost:8080/#) in <module>
4
5 integrated_gradients = IntegratedGradients(predict)
----> 6 attributions_ig = integrated_gradients.attribute((i,j), target=0)
14 frames
[/usr/local/lib/python3.8/dist-packages/captum/log/__init__.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
40 @wraps(func)
41 def wrapper(*args, **kwargs):
---> 42 return func(*args, **kwargs)
43
44 return wrapper
[/usr/local/lib/python3.8/dist-packages/captum/attr/_core/integrated_gradients.py](https://localhost:8080/#) in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
284 )
285 else:
--> 286 attributions = self._attribute(
287 inputs=inputs,
288 baselines=baselines,
[/usr/local/lib/python3.8/dist-packages/captum/attr/_core/integrated_gradients.py](https://localhost:8080/#) in _attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, step_sizes_and_alphas)
349
350 # grads: dim -> (bsz * #steps x inputs[0].shape[1:], ...)
--> 351 grads = self.gradient_func(
352 forward_fn=self.forward_func,
353 inputs=scaled_features_tpl,
[/usr/local/lib/python3.8/dist-packages/captum/_utils/gradient.py](https://localhost:8080/#) in compute_gradients(forward_fn, inputs, target_ind, additional_forward_args)
110 with torch.autograd.set_grad_enabled(True):
111 # runs forward pass
--> 112 outputs = _run_forward(forward_fn, inputs, target_ind, additional_forward_args)
113 assert outputs[0].numel() == 1, (
114 "Target not provided when necessary, cannot"
[/usr/local/lib/python3.8/dist-packages/captum/_utils/common.py](https://localhost:8080/#) in _run_forward(forward_func, inputs, target, additional_forward_args)
480 additional_forward_args = _format_additional_forward_args(additional_forward_args)
481
--> 482 output = forward_func(
483 *(*inputs, *additional_forward_args)
484 if additional_forward_args is not None
[<ipython-input-11-871a0069ab99>](https://localhost:8080/#) in predict(x, y)
1 def predict(x,y):
----> 2 y = model(ids = x, masks = y)
3 return F.softmax(y, dim=-1)
4
5 integrated_gradients = IntegratedGradients(predict)
[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
[<ipython-input-4-d39762b0fb5a>](https://localhost:8080/#) in forward(self, ids, masks)
8 self.fc = nn.Linear(768,5)
9 def forward(self,ids,masks):
---> 10 _ , cls = self.bert(input_ids = ids, attention_mask = masks, return_dict = False)
11 y = self.drop(cls)
12 y = self.fc(y)
[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
[/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
1010 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
1011
-> 1012 embedding_output = self.embeddings(
1013 input_ids=input_ids,
1014 position_ids=position_ids,
[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
[/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py](https://localhost:8080/#) in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
228
229 if inputs_embeds is None:
--> 230 inputs_embeds = self.word_embeddings(input_ids)
231 token_type_embeddings = self.token_type_embeddings(token_type_ids)
232
[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/sparse.py](https://localhost:8080/#) in forward(self, input)
158
159 def forward(self, input: Tensor) -> Tensor:
--> 160 return F.embedding(
161 input, self.weight, self.padding_idx, self.max_norm,
162 self.norm_type, self.scale_grad_by_freq, self.sparse)
[/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2208 # remove once script supports set_grad_enabled
2209 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
2211
2212
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
@Hossein-1991 Please pay attention to this part of your log
[/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
1010 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
1011
-> 1012 embedding_output = self.embeddings(
1013 input_ids=input_ids,
1014 position_ids=position_ids,
The error is given by your Bert model /transformers/models/bert/modeling_bert.py
. The model requires Long
and Int
which makes sense, as it expects you to pass the word token IDs. This is not a requirement of Captum.
If your model works with model(ids = i, masks = j)
, that's because your i
is Long
. Can you go to double check the data types of i
?
I would suggest you to carefully go through the documentation of the Bert model you use to understand what go into the model if you haven't already https://huggingface.co/docs/transformers/v4.26.1/en/model_doc/bert#transformers.BertModel.forward . It explains the inputs and their types in details. input_ids
should be LongTensor
As I explained at beginning,
long tensors won't have gradients
so you cannot use Captum IntegratedGradients
. Please check other methods like LayerIntegratedGradients
.
@aobo-y I have a long text (every text has about 2000 tokens) and want to extract key words (and phrases). Which method do you suggest?