LoRA
LoRA copied to clipboard
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Hi, In loralib's layer modules, https://github.com/microsoft/LoRA/blob/33b953630763c6299d2349abc8f154a3951a7984/loralib/layers.py#L138 It seems like `eval() `function which merges W+BA is never called. This is because when changing the model to evaluation mode in torch by...
I used the bash roberta_base_sst2.sh to reproduce the result. But I can' t find the LoRA matrix checkpoint. Ideally, there should be a 3.4MB bin file which contains the weights...
According to the paper, the lora parameter count for GPT-2 medium is 0.35M, but since the hidden dimension is 1024, and the model have 24 layers, with a rank=4, the...
i didn't find anything about the input of data(“scale”). i would like to know how to change the input of "scale" to adjust the method of lora.
when I try to train NLG model on multi-gpu,I use this: ``` python -m torch.distributed.launch --nproc_per_node=2 --use_env src/gpt2_ft.py \ --train_data ./data/e2e/train.jsonl \ --valid_data ./data/e2e/valid.jsonl \ --train_batch_size 8 \ --grad_acc 1...
I am trying to use LORA on a loaded Checkpoint of a CodeT5 model. However when I do, the run time is about the same, and my result is not...
Hi I am studing on LoRA and thanks for your work. I have a simple question, which is really confusing me. Dose the two hyper-parameters **lora-dim** of the GPT-2 model...
Getting this result For these Hyperprams: https://github.com/microsoft/LoRA/tree/main/examples/NLG#replicating-our-result-on-e2e What were the hyperparameters for results in the paper: https://github.com/microsoft/LoRA/tree/main/examples/NLG#adapting-gpt-2-using-lora
I am a beginner in deep learning and I would like to know if the reason for the gradient to be 0 is due to the vanishing gradient or if...
I have recently completed training a model using LoRA (referred to as LoRA-1) with Dataset A. I am now considering how best to proceed with training on a new Dataset...