llama icon indicating copy to clipboard operation
llama copied to clipboard

Why double the max sequence length while precomputing the frequency for rotary embedding?

Open sh0416 opened this issue 2 years ago • 3 comments

https://github.com/facebookresearch/llama/blob/57b0eb62de0636e75af471e49e2f1862d908d9d8/llama/model.py#L219

Is there anyone who explain about why the sequence length is doubled?

sh0416 avatar Apr 11 '23 13:04 sh0416

To the best of my knowledge, we don't have to constraint the maximum sequence length when we use rotary embedding because there is no learnable parameter depending on the sequence length although it is empirically not working well when the model sees the sequence longer than the sequence they trained on.

Does max_seq_len inside the configuration mean that LLaMA is trained on the sequence which is at most 2048 tokens? Then what is the max_seq_len * 2? Is it just a trick to implement RoPE?

sh0416 avatar Apr 11 '23 13:04 sh0416

Can we increase the context length say upto 4k, by fine-tuning it?

milsun avatar Apr 12 '23 11:04 milsun

I think there is no Ok to do something. All the things in the deep learning are vague. You can do this, but I couldn't guarantee that.

sh0416 avatar Apr 12 '23 12:04 sh0416