LoRA icon indicating copy to clipboard operation
LoRA copied to clipboard

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Results 106 LoRA issues
Sort by recently updated
recently updated
newest added

Hi, In loralib's layer modules, https://github.com/microsoft/LoRA/blob/33b953630763c6299d2349abc8f154a3951a7984/loralib/layers.py#L138 It seems like `eval() `function which merges W+BA is never called. This is because when changing the model to evaluation mode in torch by...

I used the bash roberta_base_sst2.sh to reproduce the result. But I can' t find the LoRA matrix checkpoint. Ideally, there should be a 3.4MB bin file which contains the weights...

According to the paper, the lora parameter count for GPT-2 medium is 0.35M, but since the hidden dimension is 1024, and the model have 24 layers, with a rank=4, the...

i didn't find anything about the input of data(“scale”). i would like to know how to change the input of "scale" to adjust the method of lora.

when I try to train NLG model on multi-gpu,I use this: ``` python -m torch.distributed.launch --nproc_per_node=2 --use_env src/gpt2_ft.py \ --train_data ./data/e2e/train.jsonl \ --valid_data ./data/e2e/valid.jsonl \ --train_batch_size 8 \ --grad_acc 1...

I am trying to use LORA on a loaded Checkpoint of a CodeT5 model. However when I do, the run time is about the same, and my result is not...

Hi I am studing on LoRA and thanks for your work. I have a simple question, which is really confusing me. Dose the two hyper-parameters **lora-dim** of the GPT-2 model...

Getting this result For these Hyperprams: https://github.com/microsoft/LoRA/tree/main/examples/NLG#replicating-our-result-on-e2e What were the hyperparameters for results in the paper: https://github.com/microsoft/LoRA/tree/main/examples/NLG#adapting-gpt-2-using-lora

I am a beginner in deep learning and I would like to know if the reason for the gradient to be 0 is due to the vanishing gradient or if...

I have recently completed training a model using LoRA (referred to as LoRA-1) with Dataset A. I am now considering how best to proceed with training on a new Dataset...