minimal-llama icon indicating copy to clipboard operation
minimal-llama copied to clipboard

How to correctly load and merge finetuned LLaMA models in different formats?

Open chenmiaomiao opened this issue 1 year ago • 0 comments

I am new to NLP and currently exploring the LLaMA model. I understand that there are different formats for this model - the original format and the Hugging Face format. I have fine-tuned the LLaMA model on my dataset using this tool https://github.com/lxe/llama-peft-tuner which is based on minimal-llama, and it saves the models in a certain way (see below):

$ ll llama-peft-tuner/models/csco-llama-7b-peft/
total 16456
drwxrwxr-x 8 lachlan lachlan     4096 May 10 10:42 ./
drwxrwxr-x 5 lachlan lachlan     4096 May 10 10:06 ../
drwxrwxr-x 2 lachlan lachlan     4096 May 10 10:21 checkpoint-1000/
drwxrwxr-x 2 lachlan lachlan     4096 May 10 10:28 checkpoint-1500/
drwxrwxr-x 2 lachlan lachlan     4096 May 10 10:35 checkpoint-2000/
drwxrwxr-x 2 lachlan lachlan     4096 May 10 10:42 checkpoint-2500/
drwxrwxr-x 2 lachlan lachlan     4096 May 10 10:13 checkpoint-500/
drwxrwxr-x 2 lachlan lachlan     4096 May 10 10:42 model-final/
-rw-rw-r-- 1 lachlan lachlan 16814911 May 10 10:42 params.p

$ ll llama-peft-tuner/models/csco-llama-7b-peft/checkpoint-2500/
total 7178936
drwxrwxr-x 2 lachlan lachlan       4096 May 10 10:42 ./
drwxrwxr-x 8 lachlan lachlan       4096 May 10 10:42 ../
-rw-rw-r-- 1 lachlan lachlan   33629893 May 10 10:42 optimizer.pt
-rw-rw-r-- 1 lachlan lachlan 7317523229 May 10 10:42 pytorch_model.bin
-rw-rw-r-- 1 lachlan lachlan      14575 May 10 10:42 rng_state.pth
-rw-rw-r-- 1 lachlan lachlan        557 May 10 10:42 scaler.pt
-rw-rw-r-- 1 lachlan lachlan        627 May 10 10:42 scheduler.pt
-rw-rw-r-- 1 lachlan lachlan      28855 May 10 10:42 trainer_state.json
-rw-rw-r-- 1 lachlan lachlan       3899 May 10 10:42 training_args.bin

I am not quite sure about the relationship between pytorch_model.bin, the original model, and adapter_model.bin. I suppose pytorch_model.bin is in the Hugging Face format. Now, I want to create a .pth model that I can load in https://github.com/juncongmoo/pyllama/tree/main/apps/gradio.

I followed the manual conversion guide at https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Manual-Conversion to convert the Hugging Face format into Hugging Face format (.bin) or PyTorch format (.pth). I tried treating pytorch_model.bin as the Hugging Face format and modified the code to ignore the LoRA, but I couldn't achieve the desired result.The fine-tuning repository mentioned below provided a way to load the trained model by combining the original model and the learned parameters. I tried to adapt this approach into https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/merge_llama_with_chinese_lora.py and tried different combinations, but the result either doesn't incorporate the trained parameters or generates meaningless outputs.

Can someone help me understand how to correctly load and merge these models? Any help would be greatly appreciated. Thank you.

chenmiaomiao avatar May 12 '23 10:05 chenmiaomiao