litgpt Implement LoRA for efficient finetuning

Implements LoRA for efficient finetuning of parrot models

[x] add finetuning script
[x] add the howto guide
[x] add tests
[x] and generate script.

Jun 08 '23 22:06 rasbt

Just ran a full Alpaca finetuning round for StableLM 3B and the results looks great. Loss converges to ~1 and it generates very sensible outputs.

It takes 45 min on a A100 for Alpaca.

Should be ready to review (and hopefully merge) when you have time @lantiga @awaelchli @carmocca

Jun 09 '23 00:06 rasbt

Thanks for the review, I will try to address these cases tonight / tomorrow morning. Btw a question regarding the CI. It looks like it got automatically canceled, or is there an issue with my PR? Screenshot 2023-06-11 at 11 57 36 AM

Jun 11 '23 16:06 rasbt

Uh that's strange. Try pushing new commits and I'll debug it if it keeps happening

Jun 11 '23 17:06 carmocca

Uh that's strange. Try pushing new commits and I'll debug it if it keeps happening

Seems to work now, no worries.

Jun 12 '23 15:06 rasbt

Implemented all the suggestions @carmocca . Should be good to review.

Jun 12 '23 18:06 rasbt

Arg, it all works fine with StableLM. But I just noticed that this causes issues with Falcon.

        size mismatch for transformer.h.20.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.21.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.22.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.23.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.24.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.25.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.26.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.27.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.28.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.29.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.30.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).
        size mismatch for transformer.h.31.attn.attn.weight: copying a param with shape torch.Size([4672, 4544]) from checkpoint, the shape in current model is torch.Size([13632, 4544]).

I think that's because of the multiquery attention. Any ideas for how to fix this?

Jun 13 '23 00:06 rasbt

I just noticed this also needs the ds_config for deepspeed. Will add this to the PR shortly

Jun 13 '23 14:06 rasbt

Should we also change this to FSDP before merging @carmocca or figure it out later?

Jun 13 '23 15:06 rasbt

Besides FSDP and Falcon, everything should be addressed now. Thanks for the thorough review!

Jun 13 '23 17:06 rasbt

litgpt litgpt copied to clipboard

Implement LoRA for efficient finetuning

litgpt
litgpt copied to clipboard