LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

[Question] how to merge the middle checkpoint file with lora

Open terminator123 opened this issue 1 year ago • 6 comments

Question

i want to test the checkpoint-5000 in lora,when i ran python scrips/merge_lora_weights.py --model-path ./checkpoints/llava-v1.5-13b-lora --model-base lmsys/vicuna-13b-v1.5 --save-model-path ./checkpoints/merge it went wrong

terminator123 avatar Dec 15 '23 01:12 terminator123

you need to copy the config.json and non_lora_trainables.bin into your checkpoint-5000 folder

Isaachhh avatar Dec 29 '23 09:12 Isaachhh

I also have the same problem #1194. Did you solve it?

charismaticchiu avatar Feb 28 '24 06:02 charismaticchiu

you need to copy the config.json and non_lora_trainables.bin into your checkpoint-5000 folder Is config.json and non_lora_trainable.bin saved only at the end of the entire training? I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?

wuwu-C avatar Apr 20 '24 13:04 wuwu-C

Is config.json and non_lora_trainable.bin saved only at the end of the entire training?

I think so.

I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

Isaachhh avatar Apr 20 '24 14:04 Isaachhh

Thank you for your reply!but I also have some question

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

  1. non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?
  2. In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora. Can you give me more detailed explanation,thank you!

wuwu-C avatar Apr 21 '24 09:04 wuwu-C

Thank you for your reply!but I also have some question

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

  1. non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?
  2. In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora. Can you give me more detailed explanation,thank you!
  1. non_lora_trainable, non_lora and trainable, so it stores projector because it's trained directly other than lora. Check here Try: a = torch.load('.../non_lora_trainables.bin') print(a.keys())

  2. Yes, you are right. And you may need to edit the source code to save projector weights in the middle.

Isaachhh avatar Apr 22 '24 02:04 Isaachhh