ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: save model parameters are incorrect

Open tianbuwei opened this issue 2 years ago • 7 comments

🐛 Describe the bug

After the Llama model is trained using Lora training method, the model can be saved normally. However, Lora's model parameters were not included in the model parameters, resulting in failure of model reasoning in the prediction.

This is the content of the script I executed: Snipaste_2023-04-12_17-10-37

This is the case for loading saved model parameters: Snipaste_2023-04-12_17-10-09

Environment

No response

tianbuwei avatar Apr 12 '23 09:04 tianbuwei

the same puzzle!! when i set the "lora_rank" as "0", i can run successfully. But the the saved model file is very big, it is about 13GB!!!

HaixHan avatar Apr 12 '23 13:04 HaixHan

When you set lora_rank to 0, you are training the model without lora training, which is described here: image

tianbuwei avatar Apr 12 '23 13:04 tianbuwei

When you set lora_rank to 0, you are training the model without lora training, which is described here: image

yes, i know that when i set "lora_rank" to 0 means training the model without lora training. But does the lora_rank need to be set as the same value in stage1, stage2 and stag3? i think it‘s unnecessary! but i do this, it runs the same error as the you met when i load the model!

HaixHan avatar Apr 12 '23 13:04 HaixHan

When you set lora_rank to 0, you are training the model without lora training, which is described here: image

yes, i know that when i set "lora_rank" to 0 means training the model without lora training. But does the lora_rank need to be set as the same value in stage1, stage2 and stag3? i think it‘s unnecessary! but i do this, it runs the same error as the you met when i load the model!

In stage 2, i set the lora_rank to 6, after finishing the training, the size of the RM is 13GB. image

HaixHan avatar Apr 12 '23 13:04 HaixHan

hi @HaixHan @tianbuwei for the lora parameters, when saving final model weights, they are merged with the previous weights, you can refer to these lines in lora.py.

Camille7777 avatar Apr 20 '23 08:04 Camille7777

@Camille7777 Hello, I found that the code you submitted did not solve the problem of using lora training to save model parameters. I found that after the training, the parameters of the model were consistent with those of the previous model, and the parameters of lora layer were not incorporated into the original model. Snipaste_2023-04-20_17-45-43

tianbuwei avatar Apr 20 '23 09:04 tianbuwei

@Camille7777 Hello, I found that the code you submitted did not solve the problem of using lora training to save model parameters. I found that after the training, the parameters of the model were consistent with those of the previous model, and the parameters of lora layer were not incorporated into the original model. Snipaste_2023-04-20_17-45-43

so, How to save only the lora weights after the model training is completed? Have you already done this? look forward to your reply!

HaixHan avatar Apr 25 '23 07:04 HaixHan

Same error, any updates?

evi-Genius avatar Jun 08 '23 07:06 evi-Genius