transformers
transformers copied to clipboard
XLNet fails with attn_type "uni"
System Info
-
transformers
version: 4.26.1 - Platform: Linux-5.10.104-linuxkit-aarch64-with-glibc2.17
- Python version: 3.8.16
- Huggingface_hub version: 0.12.0
- PyTorch version (GPU?): 1.13.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
@thomwolf
Information
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...)
Reproduction
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
# Set attention type
model.transformer.attn_type = "uni"
inputs = tokenizer(["Hello, my dog is cute", "Hello, my dog is cute too"], return_tensors="pt", padding=True)
print(inputs)
outputs = model(**inputs)
Error:
{'input_ids': tensor([[ 5, 17, 11368, 19, 94, 2288, 27, 10920, 4, 3],
[ 17, 11368, 19, 94, 2288, 27, 10920, 269, 4, 3]]), 'token_type_ids': tensor([[3, 0, 0, 0, 0, 0, 0, 0, 0, 2],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 2]]), 'attention_mask': tensor([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Traceback (most recent call last):
File "xlnet.py", line 70, in <module>
outputs = model(**inputs)
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/vscode/.local/lib/python3.8/site-packages/transformers/models/xlnet/modeling_xlnet.py", line 1547, in forward
transformer_outputs = self.transformer(
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/vscode/.local/lib/python3.8/site-packages/transformers/models/xlnet/modeling_xlnet.py", line 1161, in forward
attn_mask += data_mask[:, :, :, None]
RuntimeError: output with shape [10, 10, 1, 1] doesn't match the broadcast shape [10, 10, 2, 1]
Expected behavior
Successful forward pass with the appropriate attention masks applied.
cc @ArthurZucker and @younesbelkada
This is a fairly old model 😅 It does make sense to drop uni
(first because it is not working and did not bother anyone) but also let's just redirect to the new TransformerXL. Thanks for reporting
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.