litgpt Lora applied to all

Lora applied to all

Open Andrei-Aksionov opened this issue 1 year ago • 0 comments

This is an experiment (perhaps it needs to be a Draft?) to apply LoRA to not only to query and value matrices, but to:

query
key
value
projection
MLP
head

as described in this issue (in Lit-LLaMa repo).

Changes include porting Linear class from Microsoft's repo in order to use it as a replacement of nn.Linear. MergedLinear is for attention operations, Linear - for everything else.

So now experiments could be run within CLI by providing arguments:

python finetune/lora.py --checkpoint_dir ... --query_lora True --key_lora True --value_lora True --projection_lora True --mlp_lora True --head_lora True

Naming is lame, I know, so I need a help with it.

By default (if to not provide any arguments) for fine-tuning script LoRA is applied to only query and value so it mimics the previous behavior.

I don't have a GPU in my possession, so I did a sanity check on my laptop's CPU and a GPU in google colab with pythia-70m model and alpaca dataset. So now someone with big guns from Lightning.ai team should check whether there are any improvements or not with a much-much bigger model. Of course when have time.

Note: when I tested in Google Colab with Nvidia T4 with "16-mixed" precision I've got:

RuntimeError: probability tensor contains either inf, nan or element < 0

so I tested with "32-true". T4 doesn't support bf-16. Still it was weird. I understand why I might got this type of error with 16-true (as it's explained in this article from Lightning.ai blog), but with mixed precision it should work without problem, isn't it? P.S. the same issue (or it's not an issue?) is true for the code from the main brunch.

Jul 11 '23 13:07 Andrei-Aksionov

litgpt litgpt copied to clipboard

Lora applied to all

litgpt
litgpt copied to clipboard