transformers icon indicating copy to clipboard operation
transformers copied to clipboard

bert-large-uncased gives `(1024) must match the size of tensor b (512) at non-singleton dimension 1` error

Open monk1337 opened this issue 3 years ago • 2 comments

System Info

Python : python3.6 "transformers_version": "4.18.0"

Who can help?

No response

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

I am trying to use the bert-large-uncased for long sequence ending, but it's giving the error:

Code:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
model = BertModel.from_pretrained("bert-large-uncased")

text = "Replace me by any text you'd like."*1024
encoded_input = tokenizer(text, truncation=True, max_length=1024, return_tensors='pt')
output = model(**encoded_input)

It's giving the following error :

~/.local/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    218         if self.position_embedding_type == "absolute":
    219             position_embeddings = self.position_embeddings(position_ids)
--> 220             embeddings += position_embeddings
    221         embeddings = self.LayerNorm(embeddings)
    222         embeddings = self.dropout(embeddings)

RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 1

I also tried to change the default size of the positional embedding:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
model = BertModel.from_pretrained("bert-large-uncased")
model.config.max_position_embeddings = 1024
text = "Replace me by any text you'd like."*1024
encoded_input = tokenizer(text, truncation=True, max_length=1024, return_tensors='pt')
output = model(**encoded_input)

But still the error is persistent, How to use large model for 1024 length sequences?

Expected behavior

Expecting the output of 1024 given the sequence length of 1024

monk1337 avatar Aug 06 '22 11:08 monk1337

Hi @monk1337 The loaded model has a maximum sequence length of 512 tokens.

If you use: model = BertModel.from_pretrained("bert-large-uncased", max_position_embeddings=1024) The model won't be loaded because the loaded checkpoint also relies on 512 tokens (wrong tensor size).

If you set model.config.max_position_embeddings = 1024 after loading, this has no effect because the model is already loaded with 512 tokens.

Some models have a model.resize_position_embeddings(1024) method (e.g Pegasus) but it is not the case for BERT. You have to:

  • load the model
  • set model.config.max_position_embeddings = 1024
  • manually resize both model.embeddings.position_ids and model.embeddings.position_embeddings.weight.data tensors

Note that the way you resize model.embeddings.position_embeddings.weight.data can have a significant effect on the quality of predictions as you add new untrained parameters and vanilla attention has poor extrapolation capabilities.

If you don't mind switching to an efficient attention mecanism, you can use my repo to convert your model and process long sequences while preserving the quality of its predictions.

ccdv-ai avatar Aug 06 '22 14:08 ccdv-ai

@ccdv-ai That's helpful; I was checking the repo; excellent work! It would be great if you could provide a simple working classification example of colab?

monk1337 avatar Aug 06 '22 15:08 monk1337

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 05 '22 15:09 github-actions[bot]