long_llama
long_llama copied to clipboard
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.
I am interested in loading Long Llama with Mojo Framework as mentioned here https://github.com/tairov/llama2.mojo to increase the model speed while applying 4-bit quantization for model compression. Could you provide guidance...
Could you give me contact to you? I copied code, moved model and input both to GPU.And my results are some lisp without any sense...
How much vram needed to finetune 3b model? Is 12gb enough?
Hi, I saw in the paper mentioning that C_curr and C_prev from the same document in the batch, but didn't really see how this is implemented. It seems that in...
I don’t know much about how cross-batch data is loaded during training.
Thanks for your awesome work! There is a small problem: when I fine-tune long_llama with gradient_checkpointing, it raises an error:  Could you please update the code in transformers to...
Hi, Can you provide the code or more detail into how you zero-shot evaluate Arxiv dataset? I cannot get a good result when trying the arxiv summarization. I guess it...
I have a doubt about the rotary positional encoding part of the code. your code : ``` def rotate_as_if_first(x, rotary_emb): # x: [bs, num_attention_heads, seq_len, head_size] # apply rotary as...