FastChat
FastChat copied to clipboard
Applying LoRA to vicuna didn't reduce weight file size
Thank you very much for sharing your amazing work!
I applied LoRA to vicuna-13B, but it didn't reduce the weight file size. How come?
(i) vicuna-13B
(ii)LoRA applied vicuna-13B (r=2)
I made LoRA applied vicuna by following these steps↓.
=== First, I generated the LoRA directory (which contains adapter_config.json and adapter_model.bin) with the following script.
from transformers import AutoModelForCausalLM
import peft
model = AutoModelForCausalLM.from_pretrained("/path/to/vicuna-13b")
model.enable_input_require_grads()
model.gradient_checkpointing_enable()
peft_config = peft.LoraConfig(
task_type=peft.TaskType.SEQ_2_SEQ_LM,
r=8,
lora_alpha=32,
target_modules= ["q_proj", "v_proj"],
lora_dropout=0.1,
inference_mode=True,
)
model=peft.get_peft_model(model, peft_config)
model.print_trainable_parameters()
model.save_pretrained('/path/to/lora/directory/')
Then I ran python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/ (At this time, I followed this dependency).
I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)
I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)
It does produce so called adapter, however you can merge into existing model:
python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/
I made LoRA applied vicuna by following these steps↓.
=== First, I generated the LoRA directory (which contains adapter_config.json and adapter_model.bin) with the following script.
from transformers import AutoModelForCausalLM import peft model = AutoModelForCausalLM.from_pretrained("/path/to/vicuna-13b") model.enable_input_require_grads() model.gradient_checkpointing_enable() peft_config = peft.LoraConfig( task_type=peft.TaskType.SEQ_2_SEQ_LM, r=8, lora_alpha=32, target_modules= ["q_proj", "v_proj"], lora_dropout=0.1, inference_mode=True, ) model=peft.get_peft_model(model, peft_config) model.print_trainable_parameters() model.save_pretrained('/path/to/lora/directory/')
Then I ran python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/ (At this time, I followed this dependency).
I also facing the same issue. Investigating peft library
I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)
It does produce so called adapter, however you can merge into existing model:
python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/
I have experience with Peft, LoRA and AdaLoRA, but I haven't use the script here to train any peft weights. But I think the lora adapter will be saved as a separate model that contains the low-rank weights. It is a separate binary file, which couldn't be "merged" into the original weights.
For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py
Basically doing the following things:
peft_model_id = f"{model_name_or_path}"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path) #change the auto class
model = PeftModel.from_pretrained(model, peft_model_id)
model.eval()
with torch.no_grad():
outputs = model.generate(input_ids=, max_new_tokens=)
print(outputs)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
I think by applying LoRA, you will get an extra lora model as a plugin, without modifying any of the original weights :)
It does produce so called adapter, however you can merge into existing model:
python3 -m fastchat.model.apply_lora --base /path/to/vicuna-13b --target /output/path/ --lora /path/to/lora/directory/
I have experience with Peft, LoRA and AdaLoRA, but I haven't use the script here to train any peft weights. But I think the lora adapter will be saved as a separate model that contains the low-rank weights. It is a separate binary file, which couldn't be "merged" into the original weights.
For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py
Basically doing the following things:
peft_model_id = f"{model_name_or_path}" config = PeftConfig.from_pretrained(peft_model_id) model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path) #change the auto class model = PeftModel.from_pretrained(model, peft_model_id) model.eval() with torch.no_grad(): outputs = model.generate(input_ids=, max_new_tokens=) print(outputs) print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
Yes, there are two options, load base mode and apply so called LoRA adapter, or merge adapter into base model.
I did one experiment:
- Trained base model with LoRA training script for 1 epoch.
- Loss dropped from ~4 to ~2.
- Stopped training and merged adapter to base model.
- Resumed training with merged base model.
Guess what the loss was? It was ~2 from the start and continued dropping. So, after reaching loss around 0.08 it finished training. However, when I'm loading second merged model - it still doesn't show training results. Looks, that I haven't trained at all.
For loading the original weights and adapter's weights via peft API, you can take the reference from here: peft_adalora_seq2seq.py
Basically doing the following things:
peft_model_id = f"{model_name_or_path}" config = PeftConfig.from_pretrained(peft_model_id) model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path) #change the auto class model = PeftModel.from_pretrained(model, peft_model_id) model.eval() with torch.no_grad(): outputs = model.generate(input_ids=, max_new_tokens=) print(outputs) print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
I did model.save_pretrained(/output/path/)
after model.eval()
in the above code I am quoting in this comment, but it didn't solve this problem and didn't reduce the weight file size.
Please check here if you only want to store the LoRA adapter part. Basically, the state dict has every weight, including those that are not trainable in LoRA, so you need to pick those created by LoRA and only store them.
Please check here if you only want to store the LoRA adapter part. Basically, the state dict has every weight, including those that are not trainable in LoRA, so you need to pick those created by LoRA and only store them.
With LoRA bias "all" expected result is received. Thank you :+1: