transformers
transformers copied to clipboard
bert-large-uncased gives `(1024) must match the size of tensor b (512) at non-singleton dimension 1` error
System Info
Python : python3.6 "transformers_version": "4.18.0"
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
I am trying to use the bert-large-uncased for long sequence ending, but it's giving the error:
Code:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
model = BertModel.from_pretrained("bert-large-uncased")
text = "Replace me by any text you'd like."*1024
encoded_input = tokenizer(text, truncation=True, max_length=1024, return_tensors='pt')
output = model(**encoded_input)
It's giving the following error :
~/.local/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
218 if self.position_embedding_type == "absolute":
219 position_embeddings = self.position_embeddings(position_ids)
--> 220 embeddings += position_embeddings
221 embeddings = self.LayerNorm(embeddings)
222 embeddings = self.dropout(embeddings)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 1
I also tried to change the default size of the positional embedding:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
model = BertModel.from_pretrained("bert-large-uncased")
model.config.max_position_embeddings = 1024
text = "Replace me by any text you'd like."*1024
encoded_input = tokenizer(text, truncation=True, max_length=1024, return_tensors='pt')
output = model(**encoded_input)
But still the error is persistent, How to use large model for 1024 length sequences?
Expected behavior
Expecting the output of 1024 given the sequence length of 1024
Hi @monk1337 The loaded model has a maximum sequence length of 512 tokens.
If you use:
model = BertModel.from_pretrained("bert-large-uncased", max_position_embeddings=1024)
The model won't be loaded because the loaded checkpoint also relies on 512 tokens (wrong tensor size).
If you set model.config.max_position_embeddings = 1024 after loading, this has no effect because the model is already loaded with 512 tokens.
Some models have a model.resize_position_embeddings(1024) method (e.g Pegasus) but it is not the case for BERT.
You have to:
- load the model
- set
model.config.max_position_embeddings = 1024 - manually resize both
model.embeddings.position_idsandmodel.embeddings.position_embeddings.weight.datatensors
Note that the way you resize model.embeddings.position_embeddings.weight.data can have a significant effect on the quality of predictions as you add new untrained parameters and vanilla attention has poor extrapolation capabilities.
If you don't mind switching to an efficient attention mecanism, you can use my repo to convert your model and process long sequences while preserving the quality of its predictions.
@ccdv-ai That's helpful; I was checking the repo; excellent work! It would be great if you could provide a simple working classification example of colab?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.