llama2.c Adding LoRA fine tuning

Adding an implementation of LoRA fine tuning, heavily inspired by minLoRA. I thought the use of pytorch parametrization was interesting and simple, and fits in nicely with the approach of this project. Let me know if you were thinking of explicitly implementing the modified forward pass rather than a factored/merged forward pass, or if you think this would be a better fit as a separate repo.

I added the tinyshakespeare dataset and default to fine tuning on that. I wanted to tune the tinystories models a small amount (~50-100 steps) to get Shakespearian tiny stories :) I had some mixed results, e.g.:

Once upon a time, there was a boy named Tom and a girl named Lily. They lived in a big, expensive house, but they did not need each other. Tom and Lily were always together, even on the same summer.
Tom and Lily played every day, but they did not get along very well. They would fight, argue, and not be friends. They were angry at each other and their answers. Their hearts were hopeful but unfat.
Finally, after all their plays, they said to each other:
"I thought you would be mad, I don't really want to be your friend!"
At first, they felt sad, but as they began to talk more, neither of them had the same thought and the same word. In the end, he was in a new state yet he was still alive.

Still mostly story-like, but certainly leaning more towards the drama of Shakespeare. I like the commentary on how being exposed to new and original thoughts can leave you in a new state of being. ;)

I also tuned this for ~1k steps with the 15M param model to get something that more closely resembles Shakespeare.

I only have access to a 1080ti and a v100 16GB, so I wasn't able to do more thorough testing/experimentation on the actual Llama2 checkpoints. Let me know if you'd like to see more testing before making a decision on what to do with this.

Thanks for sharing this project! It's been fun to play with.

Jul 31 '23 02:07 wlamond

This looks elegant to me.

Jul 31 '23 13:07 Ea0011

I like where this is going, but this looks like multiple PRs in one, and a little bit of sus code. I'll inline comment

Aug 01 '23 16:08 karpathy

@wlamond If we're going to LoRA, then why not just go all the way and do QLoRA :)

It's a very simple change to your PR, just need to reference bnb.nn.Linear4bit for the 4-bit quantization.

Aug 01 '23 22:08 vgoklani

@wlamond If we're going to LoRA, then why not just go all the way and do QLoRA :)

It's a very simple change to your PR, just need to reference bnb.nn.Linear4bit for the 4-bit quantization.

@vgoklani Oooo, I do like that idea. I think it would be better as a separate PR though. I'm not sure how Andrej feels about adding other dependencies, so I'd rather get this project finished and then add QLoRA as another option if there's interest. Thanks for the idea and feedback!

Aug 01 '23 22:08 wlamond

@wlamond If we're going to LoRA, then why not just go all the way and do QLoRA :) It's a very simple change to your PR, just need to reference bnb.nn.Linear4bit for the 4-bit quantization.

@vgoklani Oooo, I do like that idea. I think it would be better as a separate PR though. I'm not sure how Andrej feels about adding other dependencies, so I'd rather get this project finished and then add QLoRA as another option if there's interest. Thanks for the idea and feedback!

definitely interest. adds possibilities to potentially do more with less 🙇‍♂️

Aug 01 '23 23:08 twobob

Another possible improvement: the original parameters doesn't need to be stored in the optimizer during lora finetuning.

Aug 20 '23 08:08 ecr23xx

@ecr23xx Another possible improvement: the original parameters doesn't need to be stored in the optimizer during lora finetuning.

The configure_optimizers method only passes parameters with requires_grad == True to the optimizer and it's called after we set up lora and freeze the original weights, so we should be all set here!

Aug 21 '23 13:08 wlamond

@ecr23xx Another possible improvement: the original parameters doesn't need to be stored in the optimizer during lora finetuning.

The configure_optimizers method only passes parameters with requires_grad == True to the optimizer and it's called after we set up lora and freeze the original weights, so we should be all set here!

Oh I got it. Look forward to your updates 🚀

Aug 22 '23 01:08 ecr23xx

@wlamond I'd love to do some experimentation with LoRA on various types of smaller models. Any chance this PR could be revived/updated?

Oct 16 '23 14:10 gohai

llama2.c llama2.c copied to clipboard

Adding LoRA fine tuning

llama2.c
llama2.c copied to clipboard