lolcats icon indicating copy to clipboard operation
lolcats copied to clipboard

Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"

Results 9 lolcats issues
Sort by recently updated
recently updated
newest added

In the needle-in-a-haystack section of your paper, you mentioned: "However, linearizing with passkey samples (LoLCATs Llama 3 8B (Passkey)) recovers 100% accuracy." Does this step involving lora-finetuning with passkey samples?...

Table 6 shows Lolcats distilled from Llama 3.1 8B gets 69.7% acc on Winogrande (w/ Llama 3.1 8B gets 73.5% acc) Then Table 3/4 show that Lolcats distilled from Llama...

Hi there, thanks for this open-source codebase which is detailed and almost works fine for training and evaluation. But as mentioned in your [Blog Part2](https://hazyresearch.stanford.edu/blog/2024-10-14-lolcats-p2), we're also fighting for A100s,...

Hi, Thanks for open sourcing your code and model weights. As the title says, I am trying to use TK kernels with pre-linearized llama 3.1 8b model and unable to...

I think there's one experiment missing in the experiment config folder: eval_Scrolls.yaml, which calls OurTrainer in the finetune_seq2seq.py in the eval phase, which uses dataset scrolls, but doesn't precede the...

Since I couldn't find any examples of mistral 8*22B models in the GitHub repository, are there any examples I can refer to, such as llama70B or llama405B, if I want...

Thanks for releasing this great work! I was able to get the training to run with sequence length 1024 on the Llama 8B model on 24GB GPUs. I would like...

Thanks for your work!, since using kernel to fit the distribution of softmax, so why you choose MSE as loss function, instead of KL loss, can you give an explanation...

Hi, thanks for your great work! I have a question regarding the block-by-block training method described in the paper. In Listing 8, the training process for the linear attention branch...