transformers icon indicating copy to clipboard operation
transformers copied to clipboard

XLNet fails with attn_type "uni"

Open jppgks opened this issue 2 years ago • 2 comments

System Info

  • transformers version: 4.26.1
  • Platform: Linux-5.10.104-linuxkit-aarch64-with-glibc2.17
  • Python version: 3.8.16
  • Huggingface_hub version: 0.12.0
  • PyTorch version (GPU?): 1.13.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help?

@thomwolf

Information

  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)

Reproduction

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")

# Set attention type
model.transformer.attn_type = "uni"

inputs = tokenizer(["Hello, my dog is cute", "Hello, my dog is cute too"], return_tensors="pt", padding=True)
print(inputs)
outputs = model(**inputs)

Error:

{'input_ids': tensor([[    5,    17, 11368,    19,    94,  2288,    27, 10920,     4,     3],
        [   17, 11368,    19,    94,  2288,    27, 10920,   269,     4,     3]]), 'token_type_ids': tensor([[3, 0, 0, 0, 0, 0, 0, 0, 0, 2],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 2]]), 'attention_mask': tensor([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Traceback (most recent call last):
  File "xlnet.py", line 70, in <module>
    outputs = model(**inputs)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vscode/.local/lib/python3.8/site-packages/transformers/models/xlnet/modeling_xlnet.py", line 1547, in forward
    transformer_outputs = self.transformer(
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vscode/.local/lib/python3.8/site-packages/transformers/models/xlnet/modeling_xlnet.py", line 1161, in forward
    attn_mask += data_mask[:, :, :, None]
RuntimeError: output with shape [10, 10, 1, 1] doesn't match the broadcast shape [10, 10, 2, 1]

Expected behavior

Successful forward pass with the appropriate attention masks applied.

jppgks avatar Feb 14 '23 14:02 jppgks

cc @ArthurZucker and @younesbelkada

sgugger avatar Feb 14 '23 14:02 sgugger

This is a fairly old model 😅 It does make sense to drop uni (first because it is not working and did not bother anyone) but also let's just redirect to the new TransformerXL. Thanks for reporting

ArthurZucker avatar Feb 14 '23 16:02 ArthurZucker

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 08 '23 15:05 github-actions[bot]