FastChat Applying LoRA to vicuna didn't reduce weight file size

Thank you very much for sharing your amazing work!

I applied LoRA to vicuna-13B, but it didn't reduce the weight file size. How come?

(i) vicuna-13B スクリーンショット 2023-04-28 17 24 05

(ii)LoRA applied vicuna-13B (r=2) スクリーンショット 2023-04-28 17 24 45

Apr 28 '23 08:04 takeshiho0531

I made LoRA applied vicuna by following these steps↓.

=== First, I generated the LoRA directory (which contains adapter_config.json and adapter_model.bin) with the following script.

from transformers import AutoModelForCausalLM
import peft

model = AutoModelForCausalLM.from_pretrained("/path/to/vicuna-13b")

model.enable_input_require_grads()
model.gradient_checkpointing_enable()

peft_config = peft.LoraConfig(
    task_type=peft.TaskType.SEQ_2_SEQ_LM,
    r=8,
    lora_alpha=32,
    target_modules= ["q_proj", "v_proj"],
    lora_dropout=0.1,
    inference_mode=True,
)

model=peft.get_peft_model(model, peft_config)
model.print_trainable_parameters()

model.save_pretrained('/path/to/lora/directory/')

Then I ran python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/ (At this time, I followed this dependency).

Apr 28 '23 08:04 takeshiho0531

I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)

Apr 28 '23 10:04 wang-yiwei

I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)

It does produce so called adapter, however you can merge into existing model: python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/

Apr 28 '23 13:04 pauliustumas

I made LoRA applied vicuna by following these steps↓.

=== First, I generated the LoRA directory (which contains adapter_config.json and adapter_model.bin) with the following script.
from transformers import AutoModelForCausalLM
import peft

model = AutoModelForCausalLM.from_pretrained("/path/to/vicuna-13b")

model.enable_input_require_grads()
model.gradient_checkpointing_enable()

peft_config = peft.LoraConfig(
    task_type=peft.TaskType.SEQ_2_SEQ_LM,
    r=8,
    lora_alpha=32,
    target_modules= ["q_proj", "v_proj"],
    lora_dropout=0.1,
    inference_mode=True,
)

model=peft.get_peft_model(model, peft_config)
model.print_trainable_parameters()

model.save_pretrained('/path/to/lora/directory/')
Then I ran python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/ (At this time, I followed this dependency).

I also facing the same issue. Investigating peft library

Apr 28 '23 13:04 pauliustumas

I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)

It does produce so called adapter, however you can merge into existing model: python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/

I have experience with Peft, LoRA and AdaLoRA, but I haven't use the script here to train any peft weights. But I think the lora adapter will be saved as a separate model that contains the low-rank weights. It is a separate binary file, which couldn't be "merged" into the original weights.

For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py

Basically doing the following things:

peft_model_id = f"{model_name_or_path}"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)   #change the auto class
model = PeftModel.from_pretrained(model, peft_model_id)

model.eval()

with torch.no_grad():
    outputs = model.generate(input_ids=, max_new_tokens=)
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

Apr 28 '23 14:04 wang-yiwei

I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)

It does produce so called adapter, however you can merge into existing model: python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/

I have experience with Peft, LoRA and AdaLoRA, but I haven't use the script here to train any peft weights. But I think the lora adapter will be saved as a separate model that contains the low-rank weights. It is a separate binary file, which couldn't be "merged" into the original weights.

For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py

Basically doing the following things:
peft_model_id = f"{model_name_or_path}"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)   #change the auto class
model = PeftModel.from_pretrained(model, peft_model_id)

model.eval()

with torch.no_grad():
    outputs = model.generate(input_ids=, max_new_tokens=)
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

Yes, there are two options, load base mode and apply so called LoRA adapter, or merge adapter into base model.

I did one experiment:

Trained base model with LoRA training script for 1 epoch.
Loss dropped from ~4 to ~2.
Stopped training and merged adapter to base model.
Resumed training with merged base model.

Guess what the loss was? It was ~2 from the start and continued dropping. So, after reaching loss around 0.08 it finished training. However, when I'm loading second merged model - it still doesn't show training results. Looks, that I haven't trained at all.

Apr 28 '23 16:04 pauliustumas

For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py

Basically doing the following things:

peft_model_id = f"{model_name_or_path}"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)   #change the auto class
model = PeftModel.from_pretrained(model, peft_model_id)

model.eval()

with torch.no_grad():
    outputs = model.generate(input_ids=, max_new_tokens=)
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

I did model.save_pretrained(/output/path/) after model.eval() in the above code I am quoting in this comment, but it didn't solve this problem and didn't reduce the weight file size.

Apr 29 '23 18:04 takeshiho0531

Please check here if you only want to store the LoRA adapter part. Basically, the state dict has every weight, including those that are not trainable in LoRA, so you need to pick those created by LoRA and only store them.

Apr 30 '23 01:04 ZYHowell

Please check here if you only want to store the LoRA adapter part. Basically, the state dict has every weight, including those that are not trainable in LoRA, so you need to pick those created by LoRA and only store them.

With LoRA bias "all" expected result is received. Thank you :+1:

May 01 '23 07:05 pauliustumas

FastChat FastChat copied to clipboard

Applying LoRA to vicuna didn't reduce weight file size

FastChat
FastChat copied to clipboard