transformers XLNet fails with attn

System Info

transformers version: 4.26.1
Platform: Linux-5.10.104-linuxkit-aarch64-with-glibc2.17
Python version: 3.8.16
Huggingface_hub version: 0.12.0
PyTorch version (GPU?): 1.13.0 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

@thomwolf

Information

My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)

Reproduction

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")

# Set attention type
model.transformer.attn_type = "uni"

inputs = tokenizer(["Hello, my dog is cute", "Hello, my dog is cute too"], return_tensors="pt", padding=True)
print(inputs)
outputs = model(**inputs)

Error:

{'input_ids': tensor([[    5,    17, 11368,    19,    94,  2288,    27, 10920,     4,     3],
        [   17, 11368,    19,    94,  2288,    27, 10920,   269,     4,     3]]), 'token_type_ids': tensor([[3, 0, 0, 0, 0, 0, 0, 0, 0, 2],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 2]]), 'attention_mask': tensor([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Traceback (most recent call last):
  File "xlnet.py", line 70, in <module>
    outputs = model(**inputs)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vscode/.local/lib/python3.8/site-packages/transformers/models/xlnet/modeling_xlnet.py", line 1547, in forward
    transformer_outputs = self.transformer(
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vscode/.local/lib/python3.8/site-packages/transformers/models/xlnet/modeling_xlnet.py", line 1161, in forward
    attn_mask += data_mask[:, :, :, None]
RuntimeError: output with shape [10, 10, 1, 1] doesn't match the broadcast shape [10, 10, 2, 1]

Expected behavior

Successful forward pass with the appropriate attention masks applied.

Feb 14 '23 14:02 jppgks

cc @ArthurZucker and @younesbelkada

Feb 14 '23 14:02 sgugger

This is a fairly old model 😅 It does make sense to drop uni (first because it is not working and did not bother anyone) but also let's just redirect to the new TransformerXL. Thanks for reporting

Feb 14 '23 16:02 ArthurZucker

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 08 '23 15:05 github-actions[bot]

transformers
transformers copied to clipboard

XLNet fails with attn_type "uni"

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers transformers copied to clipboard

XLNet fails with attn_type "uni"

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers
transformers copied to clipboard