long_llama issues

utilizing Long Llama with Mojo Framework and applying 4-bit quantization and is it possible to use flash attention 2 and your thoughts about Speculative execution for LLM

1

I am interested in loading Long Llama with Mojo Framework as mentioned here https://github.com/tairov/llama2.mojo to increase the model speed while applying 4-bit quantization for model compression. Could you provide guidance...

myname36

I have some questions

1

Could you give me contact to you? I copied code, moved model and input both to GPU.And my results are some lisp without any sense...

dziulatex

How much vram needed to finetune 3b model? Is 12gb enough?

1

How much vram needed to finetune 3b model? Is 12gb enough?

universewill

How is the contrastive data pipeline implemented?

8

Hi, I saw in the paper mentioning that C_curr and C_prev from the same document in the batch, but didn't really see how this is implemented. It seems that in...

MarkYangjiayi

FoT can only be used for pre-training, can't it be used for instruction fine-tuning?

I don’t know much about how cross-batch data is loaded during training.

wujiekd

Support for gradient_checkpointing

3

Thanks for your awesome work! There is a small problem: when I fine-tune long_llama with gradient_checkpointing, it raises an error: ![image](https://github.com/CStanKonrad/long_llama/assets/55051961/ec56d425-d0bc-45f6-be34-b62501562795) Could you please update the code in transformers to...

Richar-Du

Code for zero-shot arxiv evaluation

1

Hi, Can you provide the code or more detail into how you zero-shot evaluate Arxiv dataset? I cannot get a good result when trying the arxiv summarization. I guess it...

bronyayang

About the use of rotary position coding.

2

I have a doubt about the rotary positional encoding part of the code. your code : ``` def rotate_as_if_first(x, rotary_emb): # x: [bs, num_attention_heads, seq_len, head_size] # apply rotary as...

tianyabanbu

long_llama
long_llama copied to clipboard

Metadata

utilizing Long Llama with Mojo Framework and applying 4-bit quantization and is it possible to use flash attention 2 and your thoughts about Speculative execution for LLM

I have some questions

How much vram needed to finetune 3b model? Is 12gb enough?

How is the contrastive data pipeline implemented?

FoT can only be used for pre-training, can't it be used for instruction fine-tuning?

Support for gradient_checkpointing

Code for zero-shot arxiv evaluation

About the use of rotary position coding.

← Metadata

Owner

Metadata

long_llama long_llama copied to clipboard

Metadata

← Metadata

Owner

Metadata

long_llama
long_llama copied to clipboard