alpaca-lora icon indicating copy to clipboard operation
alpaca-lora copied to clipboard

How to load a model pre-trained on a 52k dataset and continue fine-tuning with another dataset.json?

Open T-Atlas opened this issue 2 years ago • 16 comments

I've been trying to load a LoRA model that I trained on a 52k dataset and continue fine-tuning it with another data.json file. I have referred to the discussions in the following issues: #52 and #44, but I'm still unable to figure out which part of the finetune.py script I need to modify to accomplish this.

I want to start fine-tuning from a checkpoint instead of from scratch because I believe that the model might be able to learn more task-specific knowledge this way. However, I'm not sure if my assumption is correct.

Could you please guide me on how to modify the finetune.py script to load a pre-trained LoRA model and continue fine-tuning from a checkpoint using a new data.json file?

Thank you in advance for your assistance!

T-Atlas avatar Mar 20 '23 10:03 T-Atlas

I'm no expert but I think you can load the pre-trained LoRA after loading the llama model like this in finetune.py:

model = LlamaForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    device_map=device_map,
)
# here you load the pre-trained LoRA model from the lora-alpaca folder
model = PeftModel.from_pretrained(model,'./lora-alpaca', torch_dtype=torch.float16)


felri avatar Mar 20 '23 12:03 felri

Thank you very much for your reply! Is the pre trained LoRA weight in the'./lora-alpaca'path this file? https://huggingface.co/tloen/alpaca-lora-7b And should I use trainer.train(resume_from_checkpoint=xx)?

T-Atlas avatar Mar 20 '23 13:03 T-Atlas

@T-Atlas I was thinking if training already fine tuned model again and again with new datasets is efficient or not.

Could you please come back here and tell us about the result?

  • if you managed to teach the fine tuned model with new things from new dataset
  • And more important does new model forget anything from the previous training

baleksey avatar Mar 20 '23 14:03 baleksey

Did u guys find a solution? I also want to finetune the model on a specific topic, starting from the checkpoint of alpaca cleaned and finetune with a new dataset.

makovez avatar Mar 20 '23 15:03 makovez

Did u guys find a solution? I also want to finetune the model on a specific topic, starting from the checkpoint of alpaca cleaned and finetune with a new dataset.

not yet

T-Atlas avatar Mar 20 '23 15:03 T-Atlas

Thank you for your message. For your concern, I believe that fine-tuning an already fine-tuned model with new datasets can be an effective way to adapt the model to new tasks or domains. However, this approach depends on various factors, and catastrophic forgetting is a major challenge. Regarding the results, to be honest, I have not conducted any specific experiments on this topic yet. However, I will definitely share my findings once I have identified potential solutions to the current issue.

T-Atlas avatar Mar 20 '23 16:03 T-Atlas

@T-Atlas Yeah, I believe that it's possible and will be effective to train already fine tuned model to do a new task. What I was thinking of "can I add to already existed knowledge a new one?" For example, adding math/chemistry/physics examples dataset to current alpaca in order to do those instructions much better, but to keep an old knowledge in place.

baleksey avatar Mar 20 '23 16:03 baleksey

@T-Atlas Yeah, I believe that it's possible and will be effective to train already fine tuned model to do a new task. What I was thinking of "can I add to already existed knowledge a new one?" For example, adding math/chemistry/physics examples dataset to current alpaca in order to do those instructions much better, but to keep an old knowledge in place.

Yes, your idea is correct. Models that have been fine-tuned can be trained to add new knowledge while retaining old knowledge. I remember this being called incremental learning.

T-Atlas avatar Mar 20 '23 16:03 T-Atlas

@T-Atlas do you have any idea of how to do this with alpaca?

makovez avatar Mar 20 '23 16:03 makovez

@T-Atlas do you have any idea of how to do this with alpaca?

Not yet. I plan to think again at work tomorrow.

T-Atlas avatar Mar 20 '23 16:03 T-Atlas

Is the pre trained LoRA weight in the'./lora-alpaca'path this file? https://huggingface.co/tloen/alpaca-lora-7b

the link is the LoRA model trained by the repo owner, he made it public so we don't need to run it again, the lora-alpaca folder will be the path where your LoRA finetuned model will be created, after training you can just change generate.py file to point the function to the folder

model = PeftModel.from_pretrained(model,'./lora-alpaca', torch_dtype=torch.float16)

And should I use trainer.train(resume_from_checkpoint=xx)?

good question, I just got back into my pc and I'll start trying some stuff as I'm lost too

felri avatar Mar 20 '23 17:03 felri

Just going to put a . in here as I'm facing the same issue. I've talked with T-Atlas a little bit over e-mail and we're hitting the exact same wall

zachNA2 avatar Mar 20 '23 17:03 zachNA2

./lora-alpaca contains the lora model alone. You can re-train from an existing lora instead of from scratch. Just, you'll start with a blank state optimizer.

If you run the initial learning yourself, you'll get 3 checkpoint-nnn in lora-alpaca dir (one per epoch) There, you have the lora but also the optimizer full state. With these, you can "resume_from_checkpoint" keeping the full state, instead of resetting the training. (then you'll need to train for more than 3 epochs, since you'll start at 3 already)

AngainorDev avatar Mar 21 '23 08:03 AngainorDev

Hi @T-Atlas, this can be a silly answer, but did you try using this?

trainer.train(resume_from_checkpoint=checkpoint_local_path)

Where the checkpoint_local_path variable is the relative path of a local folder containing the pre-trained weights.

It's perfectly working on my side for flan-t5 (seq2seq) or gpt-j (causal) model training, so it should work out of the box even with PEFT.

EDIT: The previous statement doesn't work because the saved .bin model is not a full checkpoint but just the updated LoRA weights (as far as I understand). You need to use the export_hf_checkpoint.py file to merge your fine-tuned weights into a new model that could be used for inference or for later fine-tuning tasks.

chainyo avatar Mar 21 '23 18:03 chainyo

Hi @T-Atlas, this can be a silly answer, but did you try using this?

trainer.train(resume_from_checkpoint=checkpoint_local_path)

Where the checkpoint_local_path variable is the relative path of a local folder containing the pre-trained weights.

It's perfectly working on my side for flan-t5 (seq2seq) or gpt-j (causal) model training, so it should work out of the box even with PEFT.

EDIT: The previous statement doesn't work because the saved .bin model is not a full checkpoint but just the updated LoRA weights (as far as I understand). You need to use the export_hf_checkpoint.py file to merge your fine-tuned weights into a new model that could be used for inference or for later fine-tuning tasks.

@ChainYo Thank you very much for your detailed reply! So what you mean is that I don't need the code

from peft import PeftModel
model = PeftModel.from_pretrained(model, <PATH>)

Instead, I just need to export the hf checkpoint according to your settings like this

trainer.train(resume_from_checkpoint=“./hf_ckpt”)

I will try!

T-Atlas avatar Mar 22 '23 02:03 T-Atlas

@T-Atlas If the ./hf_ckpt are containing checkpoints of past training, yes.

But it seems easier to take LoRA-trained parameters, merges them with the base model (for example decapoda-research/llama-7b-hf) and then start a new training with another dataset by using the newly created model (the output of the parameters merge).

Because from what I see on my side, the save_step and push_to_hub are not saving the full checkpoints but only the adapter model (based on the file size): CleanShot 2023-03-22 at 09 03 07

chainyo avatar Mar 22 '23 08:03 chainyo

@T-Atlas If the ./hf_ckpt are containing checkpoints of past training, yes.

But it seems easier to take LoRA-trained parameters, merges them with the base model (for example decapoda-research/llama-7b-hf) and then start a new training with another dataset by using the newly created model (the output of the parameters merge).

Because from what I see on my side, the save_step and push_to_hub are not saving the full checkpoints but only the adapter model (based on the file size): CleanShot 2023-03-22 at 09 03 07

I tried trainer.train(resume_from_checkpoint=“./lora-alpaca/checkpoint-1400”) and ./lora-alpaca/checkpoint-1400 are containing checkpoints of past training checkpoint.

But I see warning on my console when I run the fineture.py

There were missing keys in the checkpoint model loaded: ['base_model.model.model.embed_tokens.weight', 'base_model.model.model.layers.0.self_attn.q_proj.weight', 'base_model.model.model.layers.0.self_attn.k_proj.weight', 'base_model.model.model.layers.0.self_attn.v_proj.weight', 'base_model.model.model.layers.0.self_attn.o_proj.weight', 'base_model.model.model.layers.0.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.0.mlp.gate_proj.weight', 'base_model.model.model.layers.0.mlp.down_proj.weight', 'base_model.model.model.layers.0.mlp.up_proj.weight', 'base_model.model.model.layers.0.input_layernorm.weight', 'base_model.model.model.layers.0.post_attention_layernorm.weight', 'base_model.model.model.layers.1.self_attn.q_proj.weight', 'base_model.model.model.layers.1.self_attn.k_proj.weight', 'base_model.model.model.layers.1.self_attn.v_proj.weight', 'base_model.model.model.layers.1.self_attn.o_proj.weight', 'base_model.model.model.layers.1.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.1.mlp.gate_proj.weight', 'base_model.model.model.layers.1.mlp.down_proj.weight', 'base_model.model.model.layers.1.mlp.up_proj.weight', 'base_model.model.model.layers.1.input_layernorm.weight', 'base_model.model.model.layers.1.post_attention_layernorm.weight', 'base_model.model.model.layers.2.self_attn.q_proj.weight', 'base_model.model.model.layers.2.self_attn.k_proj.weight', 'base_model.model.model.layers.2.self_attn.v_proj.weight', 'base_model.model.model.layers.2.self_attn.o_proj.weight', 'base_model.model.model.layers.2.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.2.mlp.gate_proj.weight', 'base_model.model.model.layers.2.mlp.down_proj.weight', 'base_model.model.model.layers.2.mlp.up_proj.weight', 'base_model.model.model.layers.2.input_layernorm.weight', 'base_model.model.model.layers.2.post_attention_layernorm.weight', 'base_model.model.model.layers.3.self_attn.q_proj.weight', 'base_model.model.model.layers.3.self_attn.k_proj.weight', 'base_model.model.model.layers.3.self_attn.v_proj.weight', 'base_model.model.model.layers.3.self_attn.o_proj.weight', 'base_model.model.model.layers.3.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.3.mlp.gate_proj.weight', 'base_model.model.model.layers.3.mlp.down_proj.weight', 'base_model.model.model.layers.3.mlp.up_proj.weight', 'base_model.model.model.layers.3.input_layernorm.weight', 'base_model.model.model.layers.3.post_attention_layernorm.weight', 'base_model.model.model.layers.4.self_attn.q_proj.weight', 'base_model.model.model.layers.4.self_attn.k_proj.weight', 'base_model.model.model.layers.4.self_attn.v_proj.weight', 'base_model.model.model.layers.4.self_attn.o_proj.weight', 'base_model.model.model.layers.4.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.4.mlp.gate_proj.weight', 'base_model.model.model.layers.4.mlp.down_proj.weight', 'base_model.model.model.layers.4.mlp.up_proj.weight', 'base_model.model.model.layers.4.input_layernorm.weight', 'base_model.model.model.layers.4.post_attention_layernorm.weight', 'base_model.model.model.layers.5.self_attn.q_proj.weight', 'base_model.model.model.layers.5.self_attn.k_proj.weight', 'base_model.model.model.layers.5.self_attn.v_proj.weight', 'base_model.model.model.layers.5.self_attn.o_proj.weight', 'base_model.model.model.layers.5.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.5.mlp.gate_proj.weight', 'base_model.model.model.layers.5.mlp.down_proj.weight', 'base_model.model.model.layers.5.mlp.up_proj.weight', 'base_model.model.model.layers.5.input_layernorm.weight', 'base_model.model.model.layers.5.post_attention_layernorm.weight', 'base_model.model.model.layers.6.self_attn.q_proj.weight', 'base_model.model.model.layers.6.self_attn.k_proj.weight', 'base_model.model.model.layers.6.self_attn.v_proj.weight', 'base_model.model.model.layers.6.self_attn.o_proj.weight', 'base_model.model.model.layers.6.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.6.mlp.gate_proj.weight', 'base_model.model.model.layers.6.mlp.down_proj.weight', 'base_model.model.model.layers.6.mlp.up_proj.weight', 'base_model.model.model.layers.6.input_layernorm.weight', 'base_model.model.model.layers.6.post_attention_layernorm.weight', 'base_model.model.model.layers.7.self_attn.q_proj.weight', 'base_model.model.model.layers.7.self_attn.k_proj.weight', 'base_model.model.model.layers.7.self_attn.v_proj.weight', 'base_model.model.model.layers.7.self_attn.o_proj.weight', 'base_model.model.model.layers.7.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.7.mlp.gate_proj.weight', 'base_model.model.model.layers.7.mlp.down_proj.weight', 'base_model.model.model.layers.7.mlp.up_proj.weight', 'base_model.model.model.layers.7.input_layernorm.weight', 'base_model.model.model.layers.7.post_attention_layernorm.weight', 'base_model.model.model.layers.8.self_attn.q_proj.weight', 'base_model.model.model.layers.8.self_attn.k_proj.weight', 'base_model.model.model.layers.8.self_attn.v_proj.weight', 'base_model.model.model.layers.8.self_attn.o_proj.weight', 'base_model.model.model.layers.8.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.8.mlp.gate_proj.weight', 'base_model.model.model.layers.8.mlp.down_proj.weight', 'base_model.model.model.layers.8.mlp.up_proj.weight', 'base_model.model.model.layers.8.input_layernorm.weight', 'base_model.model.model.layers.8.post_attention_layernorm.weight', 'base_model.model.model.layers.9.self_attn.q_proj.weight', 'base_model.model.model.layers.9.self_attn.k_proj.weight', 'base_model.model.model.layers.9.self_attn.v_proj.weight', 'base_model.model.model.layers.9.self_attn.o_proj.weight', 'base_model.model.model.layers.9.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.9.mlp.gate_proj.weight', 'base_model.model.model.layers.9.mlp.down_proj.weight', 'base_model.model.model.layers.9.mlp.up_proj.weight', 'base_model.model.model.layers.9.input_layernorm.weight', 'base_model.model.model.layers.9.post_attention_layernorm.weight', 'base_model.model.model.layers.10.self_attn.q_proj.weight', 'base_model.model.model.layers.10.self_attn.k_proj.weight', 'base_model.model.model.layers.10.self_attn.v_proj.weight', 'base_model.model.model.layers.10.self_attn.o_proj.weight', 'base_model.model.model.layers.10.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.10.mlp.gate_proj.weight', 'base_model.model.model.layers.10.mlp.down_proj.weight', 'base_model.model.model.layers.10.mlp.up_proj.weight', 'base_model.model.model.layers.10.input_layernorm.weight', 'base_model.model.model.layers.10.post_attention_layernorm.weight', 'base_model.model.model.layers.11.self_attn.q_proj.weight', 'base_model.model.model.layers.11.self_attn.k_proj.weight', 'base_model.model.model.layers.11.self_attn.v_proj.weight', 'base_model.model.model.layers.11.self_attn.o_proj.weight', 'base_model.model.model.layers.11.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.11.mlp.gate_proj.weight', 'base_model.model.model.layers.11.mlp.down_proj.weight', 'base_model.model.model.layers.11.mlp.up_proj.weight', 'base_model.model.model.layers.11.input_layernorm.weight', 'base_model.model.model.layers.11.post_attention_layernorm.weight', 'base_model.model.model.layers.12.self_attn.q_proj.weight', 'base_model.model.model.layers.12.self_attn.k_proj.weight', 'base_model.model.model.layers.12.self_attn.v_proj.weight', 'base_model.model.model.layers.12.self_attn.o_proj.weight', 'base_model.model.model.layers.12.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.12.mlp.gate_proj.weight', 'base_model.model.model.layers.12.mlp.down_proj.weight', 'base_model.model.model.layers.12.mlp.up_proj.weight', 'base_model.model.model.layers.12.input_layernorm.weight', 'base_model.model.model.layers.12.post_attention_layernorm.weight', 'base_model.model.model.layers.13.self_attn.q_proj.weight', 'base_model.model.model.layers.13.self_attn.k_proj.weight', 'base_model.model.model.layers.13.self_attn.v_proj.weight', 'base_model.model.model.layers.13.self_attn.o_proj.weight', 'base_model.model.model.layers.13.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.13.mlp.gate_proj.weight', 'base_model.model.model.layers.13.mlp.down_proj.weight', 'base_model.model.model.layers.13.mlp.up_proj.weight', 'base_model.model.model.layers.13.input_layernorm.weight', 'base_model.model.model.layers.13.post_attention_layernorm.weight', 'base_model.model.model.layers.14.self_attn.q_proj.weight', 'base_model.model.model.layers.14.self_attn.k_proj.weight', 'base_model.model.model.layers.14.self_attn.v_proj.weight', 'base_model.model.model.layers.14.self_attn.o_proj.weight', 'base_model.model.model.layers.14.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.14.mlp.gate_proj.weight', 'base_model.model.model.layers.14.mlp.down_proj.weight', 'base_model.model.model.layers.14.mlp.up_proj.weight', 'base_model.model.model.layers.14.input_layernorm.weight', 'base_model.model.model.layers.14.post_attention_layernorm.weight', 'base_model.model.model.layers.15.self_attn.q_proj.weight', 'base_model.model.model.layers.15.self_attn.k_proj.weight', 'base_model.model.model.layers.15.self_attn.v_proj.weight', 'base_model.model.model.layers.15.self_attn.o_proj.weight', 'base_model.model.model.layers.15.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.15.mlp.gate_proj.weight', 'base_model.model.model.layers.15.mlp.down_proj.weight', 'base_model.model.model.layers.15.mlp.up_proj.weight', 'base_model.model.model.layers.15.input_layernorm.weight', 'base_model.model.model.layers.15.post_attention_layernorm.weight', 'base_model.model.model.layers.16.self_attn.q_proj.weight', 'base_model.model.model.layers.16.self_attn.k_proj.weight', 'base_model.model.model.layers.16.self_attn.v_proj.weight', 'base_model.model.model.layers.16.self_attn.o_proj.weight', 'base_model.model.model.layers.16.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.16.mlp.gate_proj.weight', 'base_model.model.model.layers.16.mlp.down_proj.weight', 'base_model.model.model.layers.16.mlp.up_proj.weight', 'base_model.model.model.layers.16.input_layernorm.weight', 'base_model.model.model.layers.16.post_attention_layernorm.weight', 'base_model.model.model.layers.17.self_attn.q_proj.weight', 'base_model.model.model.layers.17.self_attn.k_proj.weight', 'base_model.model.model.layers.17.self_attn.v_proj.weight', 'base_model.model.model.layers.17.self_attn.o_proj.weight', 'base_model.model.model.layers.17.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.17.mlp.gate_proj.weight', 'base_model.model.model.layers.17.mlp.down_proj.weight', 'base_model.model.model.layers.17.mlp.up_proj.weight', 'base_model.model.model.layers.17.input_layernorm.weight', 'base_model.model.model.layers.17.post_attention_layernorm.weight', 'base_model.model.model.layers.18.self_attn.q_proj.weight', 'base_model.model.model.layers.18.self_attn.k_proj.weight', 'base_model.model.model.layers.18.self_attn.v_proj.weight', 'base_model.model.model.layers.18.self_attn.o_proj.weight', 'base_model.model.model.layers.18.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.18.mlp.gate_proj.weight', 'base_model.model.model.layers.18.mlp.down_proj.weight', 'base_model.model.model.layers.18.mlp.up_proj.weight', 'base_model.model.model.layers.18.input_layernorm.weight', 'base_model.model.model.layers.18.post_attention_layernorm.weight', 'base_model.model.model.layers.19.self_attn.q_proj.weight', 'base_model.model.model.layers.19.self_attn.k_proj.weight', 'base_model.model.model.layers.19.self_attn.v_proj.weight', 'base_model.model.model.layers.19.self_attn.o_proj.weight', 'base_model.model.model.layers.19.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.19.mlp.gate_proj.weight', 'base_model.model.model.layers.19.mlp.down_proj.weight', 'base_model.model.model.layers.19.mlp.up_proj.weight', 'base_model.model.model.layers.19.input_layernorm.weight', 'base_model.model.model.layers.19.post_attention_layernorm.weight', 'base_model.model.model.layers.20.self_attn.q_proj.weight', 'base_model.model.model.layers.20.self_attn.k_proj.weight', 'base_model.model.model.layers.20.self_attn.v_proj.weight', 'base_model.model.model.layers.20.self_attn.o_proj.weight', 'base_model.model.model.layers.20.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.20.mlp.gate_proj.weight', 'base_model.model.model.layers.20.mlp.down_proj.weight', 'base_model.model.model.layers.20.mlp.up_proj.weight', 'base_model.model.model.layers.20.input_layernorm.weight', 'base_model.model.model.layers.20.post_attention_layernorm.weight', 'base_model.model.model.layers.21.self_attn.q_proj.weight', 'base_model.model.model.layers.21.self_attn.k_proj.weight', 'base_model.model.model.layers.21.self_attn.v_proj.weight', 'base_model.model.model.layers.21.self_attn.o_proj.weight', 'base_model.model.model.layers.21.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.21.mlp.gate_proj.weight', 'base_model.model.model.layers.21.mlp.down_proj.weight', 'base_model.model.model.layers.21.mlp.up_proj.weight', 'base_model.model.model.layers.21.input_layernorm.weight', 'base_model.model.model.layers.21.post_attention_layernorm.weight', 'base_model.model.model.layers.22.self_attn.q_proj.weight', 'base_model.model.model.layers.22.self_attn.k_proj.weight', 'base_model.model.model.layers.22.self_attn.v_proj.weight', 'base_model.model.model.layers.22.self_attn.o_proj.weight', 'base_model.model.model.layers.22.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.22.mlp.gate_proj.weight', 'base_model.model.model.layers.22.mlp.down_proj.weight', 'base_model.model.model.layers.22.mlp.up_proj.weight', 'base_model.model.model.layers.22.input_layernorm.weight', 'base_model.model.model.layers.22.post_attention_layernorm.weight', 'base_model.model.model.layers.23.self_attn.q_proj.weight', 'base_model.model.model.layers.23.self_attn.k_proj.weight', 'base_model.model.model.layers.23.self_attn.v_proj.weight', 'base_model.model.model.layers.23.self_attn.o_proj.weight', 'base_model.model.model.layers.23.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.23.mlp.gate_proj.weight', 'base_model.model.model.layers.23.mlp.down_proj.weight', 'base_model.model.model.layers.23.mlp.up_proj.weight', 'base_model.model.model.layers.23.input_layernorm.weight', 'base_model.model.model.layers.23.post_attention_layernorm.weight', 'base_model.model.model.layers.24.self_attn.q_proj.weight', 'base_model.model.model.layers.24.self_attn.k_proj.weight', 'base_model.model.model.layers.24.self_attn.v_proj.weight', 'base_model.model.model.layers.24.self_attn.o_proj.weight', 'base_model.model.model.layers.24.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.24.mlp.gate_proj.weight', 'base_model.model.model.layers.24.mlp.down_proj.weight', 'base_model.model.model.layers.24.mlp.up_proj.weight', 'base_model.model.model.layers.24.input_layernorm.weight', 'base_model.model.model.layers.24.post_attention_layernorm.weight', 'base_model.model.model.layers.25.self_attn.q_proj.weight', 'base_model.model.model.layers.25.self_attn.k_proj.weight', 'base_model.model.model.layers.25.self_attn.v_proj.weight', 'base_model.model.model.layers.25.self_attn.o_proj.weight', 'base_model.model.model.layers.25.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.25.mlp.gate_proj.weight', 'base_model.model.model.layers.25.mlp.down_proj.weight', 'base_model.model.model.layers.25.mlp.up_proj.weight', 'base_model.model.model.layers.25.input_layernorm.weight', 'base_model.model.model.layers.25.post_attention_layernorm.weight', 'base_model.model.model.layers.26.self_attn.q_proj.weight', 'base_model.model.model.layers.26.self_attn.k_proj.weight', 'base_model.model.model.layers.26.self_attn.v_proj.weight', 'base_model.model.model.layers.26.self_attn.o_proj.weight', 'base_model.model.model.layers.26.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.26.mlp.gate_proj.weight', 'base_model.model.model.layers.26.mlp.down_proj.weight', 'base_model.model.model.layers.26.mlp.up_proj.weight', 'base_model.model.model.layers.26.input_layernorm.weight', 'base_model.model.model.layers.26.post_attention_layernorm.weight', 'base_model.model.model.layers.27.self_attn.q_proj.weight', 'base_model.model.model.layers.27.self_attn.k_proj.weight', 'base_model.model.model.layers.27.self_attn.v_proj.weight', 'base_model.model.model.layers.27.self_attn.o_proj.weight', 'base_model.model.model.layers.27.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.27.mlp.gate_proj.weight', 'base_model.model.model.layers.27.mlp.down_proj.weight', 'base_model.model.model.layers.27.mlp.up_proj.weight', 'base_model.model.model.layers.27.input_layernorm.weight', 'base_model.model.model.layers.27.post_attention_layernorm.weight', 'base_model.model.model.layers.28.self_attn.q_proj.weight', 'base_model.model.model.layers.28.self_attn.k_proj.weight', 'base_model.model.model.layers.28.self_attn.v_proj.weight', 'base_model.model.model.layers.28.self_attn.o_proj.weight', 'base_model.model.model.layers.28.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.28.mlp.gate_proj.weight', 'base_model.model.model.layers.28.mlp.down_proj.weight', 'base_model.model.model.layers.28.mlp.up_proj.weight', 'base_model.model.model.layers.28.input_layernorm.weight', 'base_model.model.model.layers.28.post_attention_layernorm.weight', 'base_model.model.model.layers.29.self_attn.q_proj.weight', 'base_model.model.model.layers.29.self_attn.k_proj.weight', 'base_model.model.model.layers.29.self_attn.v_proj.weight', 'base_model.model.model.layers.29.self_attn.o_proj.weight', 'base_model.model.model.layers.29.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.29.mlp.gate_proj.weight', 'base_model.model.model.layers.29.mlp.down_proj.weight', 'base_model.model.model.layers.29.mlp.up_proj.weight', 'base_model.model.model.layers.29.input_layernorm.weight', 'base_model.model.model.layers.29.post_attention_layernorm.weight', 'base_model.model.model.layers.30.self_attn.q_proj.weight', 'base_model.model.model.layers.30.self_attn.k_proj.weight', 'base_model.model.model.layers.30.self_attn.v_proj.weight', 'base_model.model.model.layers.30.self_attn.o_proj.weight', 'base_model.model.model.layers.30.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.30.mlp.gate_proj.weight', 'base_model.model.model.layers.30.mlp.down_proj.weight', 'base_model.model.model.layers.30.mlp.up_proj.weight', 'base_model.model.model.layers.30.input_layernorm.weight', 'base_model.model.model.layers.30.post_attention_layernorm.weight', 'base_model.model.model.layers.31.self_attn.q_proj.weight', 'base_model.model.model.layers.31.self_attn.k_proj.weight', 'base_model.model.model.layers.31.self_attn.v_proj.weight', 'base_model.model.model.layers.31.self_attn.o_proj.weight', 'base_model.model.model.layers.31.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.31.mlp.gate_proj.weight', 'base_model.model.model.layers.31.mlp.down_proj.weight', 'base_model.model.model.layers.31.mlp.up_proj.weight', 'base_model.model.model.layers.31.input_layernorm.weight', 'base_model.model.model.layers.31.post_attention_layernorm.weight', 'base_model.model.model.norm.weight', 'base_model.model.lm_head.0.weight'].

Should I take any further step to solve it?

EDIT: I found print("\n If there's a warning about missing keys above, please disregard :)")

T-Atlas avatar Mar 23 '23 09:03 T-Atlas

I'm doing the same thing and the approach suggested by @chainyo in https://github.com/tloen/alpaca-lora/issues/92#issuecomment-1479065352 worked for me.

Here are the exact steps I'm using:

  • Merge the LLaMA weights with the LoRA weights by running the "export_hf_checkpoint.py" script. This will combine "decapoda-research/llama-7b-hf" and "tloen/alpaca-lora-7b". The combined model will be put in "./hf_ckpt". You can verify this is the fine-tuned model by testing it with https://github.com/oobabooga/text-generation-webui
  • Update "finetune.py": change the model path from "decapoda-research/llama-7b-hf" to "./hf_ckpt" so it will fine-tune on top of the fine-tuned model; change "DATA_PATH" to my own dataset file.
  • Run "finetune.py". This will create a fine-tuned LoRA on top of the already fine-tuned model from "./hf_ckpt" and output new LoRA weights to "./lora-alpaca".
  • Test the fine-tuned LoRA using the "generate.py" script. You need to change BASE_MODEL to "./hf_ckpt" and LORA_WEIGHTS to "./lora-alpaca".

Although the process works, the result is far from ideal for my use case. It seems quite hard for the model to remember specific things like "This API does XYZ" by including just that in the dataset.

fengx1024 avatar Mar 23 '23 17:03 fengx1024

I'm doing the same thing and the approach suggested by @chainyo in #92 (comment) worked for me.

Here are the exact steps I'm using:

  • Merge the LLaMA weights with the LoRA weights by running the "export_hf_checkpoint.py" script. This will combine "decapoda-research/llama-7b-hf" and "tloen/alpaca-lora-7b". The combined model will be put in "./hf_ckpt". You can verify this is the fine-tuned model by testing it with https://github.com/oobabooga/text-generation-webui
  • Update "finetune.py": change the model path from "decapoda-research/llama-7b-hf" to "./hf_ckpt" so it will fine-tune on top of the fine-tuned model; change "DATA_PATH" to my own dataset file.
  • Run "finetune.py". This will create a fine-tuned LoRA on top of the already fine-tuned model from "./hf_ckpt" and output new LoRA weights to "./lora-alpaca".
  • Test the fine-tuned LoRA using the "generate.py" script. You need to change BASE_MODEL to "./hf_ckpt" and LORA_WEIGHTS to "./lora-alpaca".

Although the process works, the result is far from ideal for my use case. It seems quite hard for the model to remember specific things like "This API does XYZ" by including just that in the dataset.

I'm happy to let you know that I was able to resolve the issue successfully by following your approach

T-Atlas avatar Mar 24 '23 00:03 T-Atlas

Merging is subobtimal on many sides.

I was able to continue a previous training successfully without merging the weights, just going on with the previous LoRA. This works either with just a previous LoRA and no more data, or with saved checkpoints that include optimizer states as well (in this case, it's really "continue/extend" the training) Quite useful when you asked for too few epochs and want to push further on.

I can share my current tweaks to the code if this is of help.

AngainorDev avatar Mar 24 '23 09:03 AngainorDev

Merging is subobtimal on many sides.

I was able to continue a previous training successfully without merging the weights, just going on with the previous LoRA. This works either with just a previous LoRA and no more data, or with saved checkpoints that include optimizer states as well (in this case, it's really "continue/extend" the training) Quite useful when you asked for too few epochs and want to push further on.

I can share my current tweaks to the code if this is of help.

Could you please share your code, how to continue training with existing lora?

baleksey avatar Mar 24 '23 10:03 baleksey

My running version is more heavily customized, but here are the minimal needed changes: https://github.com/tloen/alpaca-lora/pull/154

AngainorDev avatar Mar 24 '23 11:03 AngainorDev

My running version is more heavily customized, but here are the minimal needed changes: #154

Thank you! Is your loss after loading checkpoint/lora is somewhere near the loss it stopped training the last time?

The last time I tried to continue training (not with your code) it showed way bigger loss which VERY slowly goes down then. It looks like it starting from begging or I just don't get it. As an example - the last checkpoint loss was ~0.7917 after restored training it becomes 8.7 (x11 times bigger) and after 60 steps it still shows 5.72.

baleksey avatar Mar 24 '23 11:03 baleksey

Thank you! Is your loss after loading checkpoint/lora is somewhere near the loss it stopped training the la

When using a full checkpoint (inc. optimizer state) the loss is continuous and continues to drop.

If using a different dataset than stanford's, beware of the changing steps and warmup setting.

I save the checkpoints by epoch instead of steps myself, and tuned the warmup steps so they match one epoch. lr raises for 1 epoch to the proper value, then goes down. When resuming from a full checkpoint, there is no more warmup and the lr continues to go down. (this supposes you kept the same dataset, or you should not do a full resume with optimizer state)

If you change the dataset in between, like further tune an alpaca lora on your own dataset, then only load the LoRA weights, but no trainer state (then it starts a new train, with warmup and clear optimizer state)

AngainorDev avatar Mar 24 '23 11:03 AngainorDev

My running version is more heavily customized, but here are the minimal needed changes: #154

I set CONTINUE_FROM_CHECKPOINT = "./alpaca-lora-7b" which is download from huggingface

and find

Restarting from ./alpaca-lora-7b/adapter_model.bin
Traceback (most recent call last):
  File "/DATA/lianjh/alpaca-lora/finetune.py", line 70, in <module>
    adapters_weights = torch.load(checkpoint_name)
  File "/home/lianjh/miniconda3/envs/alpaca/lib/python3.9/site-packages/torch/serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/lianjh/miniconda3/envs/alpaca/lib/python3.9/site-packages/torch/serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

What should I do? #154

T-Atlas avatar Mar 25 '23 04:03 T-Atlas

I tried the same to make sure. Got the weights from hf, continued training from them, no issue

Restarting from ./lora-alpaca/alpaca-lora-7b/adapter_model.bin
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
[...]

Your error seems to be a pickle error at loading the .bin, even before anything happens. So I'd suspect a corrupted download maybe or a version error (torch, python?) Does that very same file work for inference in the same env?

Double check you have the full lora file and not just a lfs link. Download manually and check the size, 16.8 Mb

AngainorDev avatar Mar 25 '23 08:03 AngainorDev

I tried the same to make sure. Got the weights from hf, continued training from them, no issue

Restarting from ./lora-alpaca/alpaca-lora-7b/adapter_model.bin
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
[...]

Your error seems to be a pickle error at loading the .bin, even before anything happens. So I'd suspect a corrupted download maybe or a version error (torch, python?) Does that very same file work for inference in the same env?

Double check you have the full lora file and not just a lfs link. Download manually and check the size, 16.8 Mb

Thank you very much. I checked my files again and found that they were not downloaded correctly. My problem has been solved. Thanks for your contribution and patience.

T-Atlas avatar Mar 25 '23 13:03 T-Atlas