LoRA icon indicating copy to clipboard operation
LoRA copied to clipboard

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Results 106 LoRA issues
Sort by recently updated
recently updated
newest added

(gh_LoRA) ub2004@ub2004-B85M-A0:~/llm_dev/LoRA/examples/NLG$ python3 -m torch.distributed.launch --nproc_per_node=1 src/gpt2_ft.py --train_data ./data/e2e/train.jsonl --valid_data ./data/e2e/valid.jsonl --train_batch_size 8 --grad_acc 1 --valid_batch_size 4 --seq_len 512 --model_card gpt2.md --init_checkpoint ./pretrained_checkpoints/gpt2-medium-pytorch_model.bin --platform local --clip 0.0 --lr 0.0002 --weight_decay...

how to convert Megatron-DeepSpeed ColumnParallelLinear and RowParallelLinear to lora linear layer? ColumnParallelLinear defined in: https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/mpu/layers.py#L206

I thought this might be interesting as an alternate implementation of LoRA leveraging tensor subclasses and reparametrization. https://gist.github.com/Chillee/a8d2070b1b7b3f97d8c87bac3c366f8e The main idea here is that we can leverage parametrization in order...

Hi, Thank you for this really nice paper. This is not an issue but a general question, why is there a Linear and MergedLinear class? Thank you, Maxime.

Hi, Just wanted to point out that Lora is already [used](https://lora-alliance.org/) and can possibly generate confusion

Excuse me, has the LoRA paper been accepted yet? Thank you

I found that the parameter initialization at **reset_parameters()** of the **Embedding** class differs from the LoRa paper and other implementations at **layers.py**. I initialized lora_A with **nn.init.normal_()** while Lora_B with...

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?...

Is it possible to use LoRA to fine tune GPT NeoX 20B?

Hi, Thank you for sharing the source code. I really enjoy the work you propose. While reading the paper and reproducing the results I got a couple of questions: 1....