Position Embedding with Seq > 512

Open Codys12 opened this issue 1 year ago • 1 comments

I am trying to run Llama-3.1-8B with a seq > 512, and I get this error. Do I have to manually set position embeddings to get this to work?

from airllm import AutoModel
model = AutoModel.from_pretrained("meta-llama/Meta-Llama-3.1-8B", delete_original=True)

prompts = ["a " * 10000 for i in range(100)]
model.tokenizer.pad_token = model.tokenizer.eos_token

input_tokens = model.tokenizer(prompts,
    return_tensors="pt",
    truncation=True,
    padding=True,
    max_length=1024,

)

generation_output = model.forward(
    input_ids=input_tokens['input_ids'].cuda(),
    attention_mask=input_tokens['attention_mask'].cuda(),
    use_cache=False,
)

Fetching 13 files: 100%
 13/13 [00:00<00:00, 986.88it/s]
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.norm.': True, 'lm_head.': True}
saved layers already found in /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-8B/snapshots/48d6d0fc4e02fb1269b36940650a1b7233035cbb/splitted_model
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
running layers(cuda:0):   3%|▎         | 1/35 [00:01<00:35,  1.04s/it]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-14-c76626e36e7d>](https://localhost:8080/#) in <cell line: 15>()
     13 )
     14 
---> 15 generation_output = model.forward(
     16     input_ids=input_tokens['input_ids'].cuda(),
     17     attention_mask=input_tokens['attention_mask'].cuda(),

7 frames
[/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py](https://localhost:8080/#) in apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim)
    217     cos = cos.unsqueeze(unsqueeze_dim)
    218     sin = sin.unsqueeze(unsqueeze_dim)
--> 219     q_embed = (q * cos) + (rotate_half(q) * sin)
    220     k_embed = (k * cos) + (rotate_half(k) * sin)
    221     return q_embed, k_embed

RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 2

Aug 05 '24 17:08 Codys12

fixed with

model.max_seq_len = 1024
model.init_model()

maybe should be in init option

Aug 05 '24 18:08 Codys12