text-generation-webui LORA not applying at all

Describe the bug

For some reason when I train a lora and apply it to a model(in this case the same i've trained it on) nothing changes in the output

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

check a model output with a example text;
train a lora;
apply it to the model;
do the same thing in the first step.

Screenshot

001 002 003 004 005

Logs

No errors

System Info

win 11
nvidia rtx 4060 ti 16gb

Jan 05 '24 21:01 nephi-dev

I've noticed something similar too. I found a good 4 bit GPTQ model and I trained a LoRA for it, which worked great. I wanted to see what the unquantized model would be like by comparison, so I downloaded that and applied the same LoRA, and it had (close to?) no effect.

I tried LoRA training the full sized model directly, using the quantization options on the Transformers loader to make it small enough to fit. The LoRA training seemed to go well, but applying the resulting LoRA to the model again (can't remember if I quantized it or not when doing inference with the LoRA) seemed to have no real effect on the output.

Something is messed up somewhere. Did you click on any 'quantize on the fly' options in your case? Either when training, or when using the trained LoRA?

Jan 06 '24 14:01 araleza

Although one thing worth noting is that LoRAs are not great at training to add knowledge (such as 'who Sona is'). They mostly do style and personality. Using the character sheet context is much better at adding facts. Some people have also been trying DPO to add knowledge too.

Jan 06 '24 14:01 araleza

Although one thing worth noting is that LoRAs are not great at training to add knowledge (such as 'who Sona is'). They mostly do style and personality. Using the character sheet context is much better at adding facts. Some people have also been trying DPO to add knowledge too.

good point, but this have already worked before, and stopped working now

Jan 09 '24 19:01 nephi-dev

I noticed today that if I apply a LoRA to a 4-bit GPTQ model loaded with the Transformers model loader, it has no effect. But if I load that same model with ExLlamav2_HF loader and apply it, it works.

So I can only train with the Transformers loader, and apply with the ExLlamav2_HF loader. Seems buggy to me.

Jan 09 '24 19:01 araleza

I have the same issue. When I'm applying LoRA over a model loaded with Transformers, the LoRA doesn't apply even if it says that it was applied successfully. The model outputs the same way the base model does.

I noticed it after updating text-generation-webui today, it used to work correctly just before the update. I don't remember what version that was before updating, though, but I haven't updated it in around a month.

Applying LoRA over the safetensors version of the model with ExLlamav2_HF works for me too, but in the case of Transformers, it is certainly a bug.

Jan 09 '24 22:01 ArakiSatoshi

Okay, so I started learning a bit about LoRA coding to try to fix this. I definitely haven't fixed it yet, but I have found a workaround (in case anyone else actually knows what's really wrong).

I went into the file 'modules/LoRA.py' (a subdirectory of the main text-generation-webui directory), and edited this line (line 124 for the current version this file):

shared.model = PeftModel.from_pretrained(shared.model, get_lora_path(lora_names[0]), adapter_name=lora_names[0], **params)

...by deleting the adapter_name parameter, so the line is now like this:

shared.model = PeftModel.from_pretrained(shared.model, get_lora_path(lora_names[0]), **params)

And for some reason the LoRA now successfully applies to my GPTQ 4-bit model that I load with the Transformers loader again. (Edit: I've now also tested this with applying a LoRA to an 8-bit GPTQ model loaded with the Transformers loader, where the LoRA was trained on a 4-bit version. This also works.)

Remember to start the webui server rather than refresh the web page, if you decide to change this line yourself while you're running webui.

I hope someone that knows what they're doing with LoRA Transformers code recognizes what this might mean.

Jan 12 '24 19:01 araleza

I use LORA all the time, You need to specify which loader are you using and what format of the model is. I see mentioned GPTQ - it still needs to specify which loader - AutoGPTQ, Exllama, Exllama 2....

Jan 17 '24 06:01 FartyPants

I am having a similar issue. I am unfamiliar with GitHub issue reporting/commenting so let me know if I need to provide more info.

My issue happens when I load a regular unquantized LLaMA v2 model (13b-chat-hf) through the regular transformers loader onto my GPU and use the automatic quantization. If I select load-in-8bit or 4bit, the LoRAs do not work at all. If I let it load unquantinized, it works (but very slowly). CPU works as well for the same reason.

Not sure if that helps at all, but hopefully it provides some extra insight.

Feb 19 '24 19:02 reedmayhew18

Do I understand correctly that if I don't have a video card and only use the CPU, I can't use LORA in text-generation-webui in any way?

Apr 13 '24 19:04 Alekkc

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Jun 12 '24 23:06 github-actions[bot]

text-generation-webui text-generation-webui copied to clipboard

LORA not applying at all

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard