FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Applying LoRA to vicuna didn't reduce weight file size

Open takeshiho0531 opened this issue 1 year ago • 8 comments

Thank you very much for sharing your amazing work!

I applied LoRA to vicuna-13B, but it didn't reduce the weight file size. How come?

(i) vicuna-13B スクリーンショット 2023-04-28 17 24 05

(ii)LoRA applied vicuna-13B (r=2) スクリーンショット 2023-04-28 17 24 45

takeshiho0531 avatar Apr 28 '23 08:04 takeshiho0531

I made LoRA applied vicuna by following these steps↓.

=== First, I generated the LoRA directory (which contains adapter_config.json and adapter_model.bin) with the following script.

from transformers import AutoModelForCausalLM
import peft

model = AutoModelForCausalLM.from_pretrained("/path/to/vicuna-13b")

model.enable_input_require_grads()
model.gradient_checkpointing_enable()

peft_config = peft.LoraConfig(
    task_type=peft.TaskType.SEQ_2_SEQ_LM,
    r=8,
    lora_alpha=32,
    target_modules= ["q_proj", "v_proj"],
    lora_dropout=0.1,
    inference_mode=True,
)

model=peft.get_peft_model(model, peft_config)
model.print_trainable_parameters()

model.save_pretrained('/path/to/lora/directory/')

Then I ran python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/ (At this time, I followed this dependency).

takeshiho0531 avatar Apr 28 '23 08:04 takeshiho0531

I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)

wang-yiwei avatar Apr 28 '23 10:04 wang-yiwei

I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)

It does produce so called adapter, however you can merge into existing model: python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/

pauliustumas avatar Apr 28 '23 13:04 pauliustumas

I made LoRA applied vicuna by following these steps↓.

=== First, I generated the LoRA directory (which contains adapter_config.json and adapter_model.bin) with the following script.

from transformers import AutoModelForCausalLM
import peft

model = AutoModelForCausalLM.from_pretrained("/path/to/vicuna-13b")

model.enable_input_require_grads()
model.gradient_checkpointing_enable()

peft_config = peft.LoraConfig(
    task_type=peft.TaskType.SEQ_2_SEQ_LM,
    r=8,
    lora_alpha=32,
    target_modules= ["q_proj", "v_proj"],
    lora_dropout=0.1,
    inference_mode=True,
)

model=peft.get_peft_model(model, peft_config)
model.print_trainable_parameters()

model.save_pretrained('/path/to/lora/directory/')

Then I ran python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/ (At this time, I followed this dependency).

I also facing the same issue. Investigating peft library

pauliustumas avatar Apr 28 '23 13:04 pauliustumas

I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)

It does produce so called adapter, however you can merge into existing model: python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/

I have experience with Peft, LoRA and AdaLoRA, but I haven't use the script here to train any peft weights. But I think the lora adapter will be saved as a separate model that contains the low-rank weights. It is a separate binary file, which couldn't be "merged" into the original weights.

For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py

Basically doing the following things:

peft_model_id = f"{model_name_or_path}"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)   #change the auto class
model = PeftModel.from_pretrained(model, peft_model_id)

model.eval()

with torch.no_grad():
    outputs = model.generate(input_ids=, max_new_tokens=)
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

wang-yiwei avatar Apr 28 '23 14:04 wang-yiwei

I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)

It does produce so called adapter, however you can merge into existing model: python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/

I have experience with Peft, LoRA and AdaLoRA, but I haven't use the script here to train any peft weights. But I think the lora adapter will be saved as a separate model that contains the low-rank weights. It is a separate binary file, which couldn't be "merged" into the original weights.

For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py

Basically doing the following things:

peft_model_id = f"{model_name_or_path}"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)   #change the auto class
model = PeftModel.from_pretrained(model, peft_model_id)

model.eval()

with torch.no_grad():
    outputs = model.generate(input_ids=, max_new_tokens=)
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

Yes, there are two options, load base mode and apply so called LoRA adapter, or merge adapter into base model.

I did one experiment:

  • Trained base model with LoRA training script for 1 epoch.
  • Loss dropped from ~4 to ~2.
  • Stopped training and merged adapter to base model.
  • Resumed training with merged base model.

Guess what the loss was? It was ~2 from the start and continued dropping. So, after reaching loss around 0.08 it finished training. However, when I'm loading second merged model - it still doesn't show training results. Looks, that I haven't trained at all.

pauliustumas avatar Apr 28 '23 16:04 pauliustumas

For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py

Basically doing the following things:

peft_model_id = f"{model_name_or_path}"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)   #change the auto class
model = PeftModel.from_pretrained(model, peft_model_id)

model.eval()

with torch.no_grad():
    outputs = model.generate(input_ids=, max_new_tokens=)
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

I did model.save_pretrained(/output/path/) after model.eval() in the above code I am quoting in this comment, but it didn't solve this problem and didn't reduce the weight file size.

takeshiho0531 avatar Apr 29 '23 18:04 takeshiho0531

Please check here if you only want to store the LoRA adapter part. Basically, the state dict has every weight, including those that are not trainable in LoRA, so you need to pick those created by LoRA and only store them.

ZYHowell avatar Apr 30 '23 01:04 ZYHowell

Please check here if you only want to store the LoRA adapter part. Basically, the state dict has every weight, including those that are not trainable in LoRA, so you need to pick those created by LoRA and only store them.

With LoRA bias "all" expected result is received. Thank you :+1:

pauliustumas avatar May 01 '23 07:05 pauliustumas