litgpt
litgpt copied to clipboard
Implement LoRA for efficient finetuning
Implements LoRA for efficient finetuning of parrot models
- [x] add finetuning script
- [x] add the howto guide
- [x] add tests
- [x] and generate script.
Just ran a full Alpaca finetuning round for StableLM 3B and the results looks great. Loss converges to ~1 and it generates very sensible outputs.
It takes 45 min on a A100 for Alpaca.
Should be ready to review (and hopefully merge) when you have time @lantiga @awaelchli @carmocca
Thanks for the review, I will try to address these cases tonight / tomorrow morning. Btw a question regarding the CI. It looks like it got automatically canceled, or is there an issue with my PR?
Uh that's strange. Try pushing new commits and I'll debug it if it keeps happening
Uh that's strange. Try pushing new commits and I'll debug it if it keeps happening
Seems to work now, no worries.
Implemented all the suggestions @carmocca . Should be good to review.
Arg, it all works fine with StableLM. But I just noticed that this causes issues with Falcon.
size mismatch for transformer.h.20.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.21.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.22.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.23.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.24.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.25.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.26.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.27.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.28.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.29.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.30.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
size mismatch for transformer.h.31.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
I think that's because of the multiquery attention. Any ideas for how to fix this?
I just noticed this also needs the ds_config for deepspeed. Will add this to the PR shortly
Should we also change this to FSDP before merging @carmocca or figure it out later?
Besides FSDP and Falcon, everything should be addressed now. Thanks for the thorough review!