alpaca-lora icon indicating copy to clipboard operation
alpaca-lora copied to clipboard

finetune alpaca-lora with custom dataset got poor results

Open nkjulia opened this issue 1 year ago • 19 comments

i have finetuned alpaca-lora with about 60 thousand records just for generating prompts for text to image tasks like stable diffusion.

the decrease in loss is in line with expectations, but got poor results when inference... any suggestions? thanks a lot.

nkjulia avatar Apr 12 '23 08:04 nkjulia

I meet the same question. When I finished the train with 700MB Chinese data which has 350 thousands records, I also got poor results, even worse than original model.

xv994 avatar Apr 12 '23 09:04 xv994

It's hard to say anything without more context.

AngainorDev avatar Apr 12 '23 11:04 AngainorDev

Make sure your pertrained model's LlamaTokenizer version is updated, and there are two important factors we need to check discreetly, the data format and duplication.

lywinged avatar Apr 12 '23 14:04 lywinged

Make sure your pertrained model's LlamaTokenizer version is updated, and there are two important factors we need to check discreetly, the data format and duplication.

You made a very interesting point! What happened to LlamaTokenizer btw?

yana-xuyan avatar Apr 12 '23 16:04 yana-xuyan

Check this #279

lywinged avatar Apr 12 '23 16:04 lywinged

It's hard to say anything without more context.

pretrained llama model : decapoda-research/llama-7b-hf data formate: raw prompts for text-image tasks,didn't use the templates in alpaca.json training set size: 60000 trained for 7 epochs,did not get the expected results.

nkjulia avatar Apr 13 '23 01:04 nkjulia

Finally i figure out my problem where inference! the saved model file "adapter_model.bin" is not valid,with small size,i replace it with other bin file in the last checkpint directory,the inference process seems right.

the final model saved by "model.save_pretrained".

nkjulia avatar Apr 13 '23 09:04 nkjulia

Finally i figure out my problem where inference! the saved model file "adapter_model.bin" is not valid,with small size,i replace it with other bin file in the last checkpint directory,the inference process seems right.

the final model saved by "model.save_pretrained".

#293

lywinged avatar Apr 13 '23 09:04 lywinged

Finally i figure out my problem where inference! the saved model file "adapter_model.bin" is not valid,with small size,i replace it with other bin file in the last checkpint directory,the inference process seems right.

the final model saved by "model.save_pretrained".

Hi - can you confirm that replacing the adapter_model.bin with the final checkpoint .bin file solves the issue? I have the same problem.

Thank you

griff4692 avatar Apr 13 '23 16:04 griff4692

Hi, Could you please share ur training code? For me it's a paraphrasing task. Input needs to be paraphrased. But I am getting poor results. 60k datapoints are there.

singularity014 avatar Apr 13 '23 20:04 singularity014

Finally i figure out my problem where inference! the saved model file "adapter_model.bin" is not valid,with small size,i replace it with other bin file in the last checkpint directory,the inference process seems right. the final model saved by "model.save_pretrained".

Hi - can you confirm that replacing the adapter_model.bin with the final checkpoint .bin file solves the issue? I have the same problem.

Thank you

yes! you can check whether the inference results match the expected outcomes.

nkjulia avatar Apr 17 '23 05:04 nkjulia

Hi, Could you please share ur training code? For me it's a paraphrasing task. Input needs to be paraphrased. But I am getting poor results. 60k datapoints are there.

I used the official training script for a simple language generation task, and I suggest you refer to the construction of the ChatGPT dialogue training samples for your needs.

nkjulia avatar Apr 17 '23 06:04 nkjulia

@nkjulia 您好 我用5834条中文数据去微调decapoda-research/llama-7b-hf,跑了10个epoch,回复中有很多的重复,比如: 问:怎么在很多交换机中查找一个MAC地址的摄像头 答:您好,以下参考1、查询MAC地址的摄像头的命令:displaymac-address查询MAC地址的摄像头的命令:displaymac-address查询MAC地址的摄像头的命令:displaymac-address查询MAC地址的摄像头的命令:displaymac-address查询MAC地址的摄像头的命令:displaymac-address查询

和原答案差距也比较大,您觉得是什么原因呢

训练过程: (base) root@uni-dzkf-gpu:/usr/local/dbbd/alpaca-lora# conda activate llama (llama) root@uni-dzkf-gpu:/usr/local/dbbd/alpaca-lora# python finetune.py \

--base_model '/usr/local/dbbd/model/llama-7b-hf' \
--data_path '/usr/local/dbbd/data/kuai_clean_5489.json' \
--output_dir './lora-alpaca' \
--micro_batch_size 16 \
--num_epochs 10 \
--eval_step 50 \
--save_step 50 \
--logging_steps 10 \
--val_set_size 200 \
--lora_r 16 \
--lora_target_modules '[q_proj,k_proj,v_proj,o_proj]'

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /root/anaconda3/envs/llama/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /root/anaconda3/envs/llama/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /root/anaconda3/envs/llama did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /root/anaconda3/envs/llama/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so... Training Alpaca-LoRA model with params: base_model: /usr/local/dbbd/model/llama-7b-hf data_path: /usr/local/dbbd/data/kuai_clean_5489.json output_dir: ./lora-alpaca batch_size: 128 micro_batch_size: 16 num_epochs: 10 eval_step: 50 save_step: 50 logging_steps: 10 learning_rate: 0.0003 cutoff_len: 256 val_set_size: 200 lora_r: 16 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj'] train_on_inputs: True group_by_length: False wandb_project: wandb_run_name: wandb_watch: wandb_log_model: resume_from_checkpoint: False prompt template: alpaca

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:13<00:00, 2.46it/s] The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. Found cached dataset json (/root/.cache/huggingface/datasets/json/default-7397a7778babe591/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e) 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 754.24it/s] trainable params: 16777216 || all params: 6755192832 || trainable%: 0.24836028248556738 Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-7397a7778babe591/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-f081ff23152c4b4b.arrow and /root/.cache/huggingface/datasets/json/default-7397a7778babe591/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-ff5e359ae7e291a4.arrow {'loss': 2.9378, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.24} {'loss': 2.8145, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.48} {'loss': 2.5446, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.73} {'loss': 2.1616, 'learning_rate': 0.00011999999999999999, 'epoch': 0.97} {'loss': 2.0516, 'learning_rate': 0.00015, 'epoch': 1.21} {'eval_loss': 2.0572762489318848, 'eval_runtime': 32.4293, 'eval_samples_per_second': 6.167, 'eval_steps_per_second': 0.771, 'epoch': 1.21} {'loss': 2.0044, 'learning_rate': 0.00017999999999999998, 'epoch': 1.45} {'loss': 1.9565, 'learning_rate': 0.00020999999999999998, 'epoch': 1.69} {'loss': 1.8968, 'learning_rate': 0.00023999999999999998, 'epoch': 1.93} {'loss': 1.8612, 'learning_rate': 0.00027, 'epoch': 2.18} {'loss': 1.7992, 'learning_rate': 0.0003, 'epoch': 2.42} {'eval_loss': 1.8170967102050781, 'eval_runtime': 32.5632, 'eval_samples_per_second': 6.142, 'eval_steps_per_second': 0.768, 'epoch': 2.42} {'loss': 1.7634, 'learning_rate': 0.00029032258064516127, 'epoch': 2.66} {'loss': 1.7366, 'learning_rate': 0.00028064516129032256, 'epoch': 2.9} {'loss': 1.7071, 'learning_rate': 0.00027096774193548386, 'epoch': 3.14} {'loss': 1.6842, 'learning_rate': 0.00026129032258064515, 'epoch': 3.38} {'loss': 1.6623, 'learning_rate': 0.00025161290322580645, 'epoch': 3.63} {'eval_loss': 1.7218327522277832, 'eval_runtime': 32.6299, 'eval_samples_per_second': 6.129, 'eval_steps_per_second': 0.766, 'epoch': 3.63} {'loss': 1.6629, 'learning_rate': 0.00024193548387096771, 'epoch': 3.87} {'loss': 1.6313, 'learning_rate': 0.000232258064516129, 'epoch': 4.11} {'loss': 1.61, 'learning_rate': 0.0002225806451612903, 'epoch': 4.35} {'loss': 1.6067, 'learning_rate': 0.0002129032258064516, 'epoch': 4.59} {'loss': 1.5914, 'learning_rate': 0.00020322580645161287, 'epoch': 4.83} {'eval_loss': 1.6802856922149658, 'eval_runtime': 32.3826, 'eval_samples_per_second': 6.176, 'eval_steps_per_second': 0.772, 'epoch': 4.83} {'loss': 1.5789, 'learning_rate': 0.00019354838709677416, 'epoch': 5.08} {'loss': 1.5424, 'learning_rate': 0.00018387096774193548, 'epoch': 5.32} {'loss': 1.549, 'learning_rate': 0.00017419354838709678, 'epoch': 5.56} {'loss': 1.5555, 'learning_rate': 0.00016451612903225804, 'epoch': 5.8} {'loss': 1.5341, 'learning_rate': 0.00015483870967741934, 'epoch': 6.04} {'eval_loss': 1.6582854986190796, 'eval_runtime': 32.6762, 'eval_samples_per_second': 6.121, 'eval_steps_per_second': 0.765, 'epoch': 6.04} {'loss': 1.5065, 'learning_rate': 0.00014516129032258063, 'epoch': 6.28} {'loss': 1.512, 'learning_rate': 0.00013548387096774193, 'epoch': 6.53} {'loss': 1.494, 'learning_rate': 0.00012580645161290322, 'epoch': 6.77} {'loss': 1.515, 'learning_rate': 0.0001161290322580645, 'epoch': 7.01} {'loss': 1.4671, 'learning_rate': 0.0001064516129032258, 'epoch': 7.25} {'eval_loss': 1.6482064723968506, 'eval_runtime': 32.0293, 'eval_samples_per_second': 6.244, 'eval_steps_per_second': 0.781, 'epoch': 7.25} {'loss': 1.4691, 'learning_rate': 9.677419354838708e-05, 'epoch': 7.49} {'loss': 1.4644, 'learning_rate': 8.709677419354839e-05, 'epoch': 7.73} {'loss': 1.4666, 'learning_rate': 7.741935483870967e-05, 'epoch': 7.98} {'loss': 1.4392, 'learning_rate': 6.774193548387096e-05, 'epoch': 8.22} {'loss': 1.4391, 'learning_rate': 5.806451612903225e-05, 'epoch': 8.46} {'eval_loss': 1.6425567865371704, 'eval_runtime': 32.3429, 'eval_samples_per_second': 6.184, 'eval_steps_per_second': 0.773, 'epoch': 8.46} {'loss': 1.4339, 'learning_rate': 4.838709677419354e-05, 'epoch': 8.7} {'loss': 1.4427, 'learning_rate': 3.8709677419354835e-05, 'epoch': 8.94} {'loss': 1.4231, 'learning_rate': 2.9032258064516126e-05, 'epoch': 9.18} {'loss': 1.4173, 'learning_rate': 1.9354838709677417e-05, 'epoch': 9.43} {'loss': 1.425, 'learning_rate': 9.677419354838709e-06, 'epoch': 9.67} {'eval_loss': 1.6385363340377808, 'eval_runtime': 32.2263, 'eval_samples_per_second': 6.206, 'eval_steps_per_second': 0.776, 'epoch': 9.67} {'loss': 1.421, 'learning_rate': 0.0, 'epoch': 9.91} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 410/410 [9:03:49<00:00, 78.44s/it]There were missing keys in the checkpoint model loaded: ['base_model.model.model.embed_tokens.weight', 'base_model.model.model.layers.0.self_attn.q_proj.weight', 'base_model.model.model.layers.0.self_attn.k_proj.weight', 'base_model.model.model.layers.0.self_attn.v_proj.weight', 'base_model.model.model.layers.0.self_attn.o_proj.weight', 'base_model.model.model.layers.0.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.0.mlp.gate_proj.weight', 'base_model.model.model.layers.0.mlp.down_proj.weight', 'base_model.model.model.layers.0.mlp.up_proj.weight', 'base_model.model.model.layers.0.input_layernorm.weight', 'base_model.model.model.layers.0.post_attention_layernorm.weight', 'base_model.model.model.layers.1.self_attn.q_proj.weight', 'base_model.model.model.layers.1.self_attn.k_proj.weight', 'base_model.model.model.layers.1.self_attn.v_proj.weight', 'base_model.model.model.layers.1.self_attn.o_proj.weight', 'base_model.model.model.layers.1.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.1.mlp.gate_proj.weight', 'base_model.model.model.layers.1.mlp.down_proj.weight', 'base_model.model.model.layers.1.mlp.up_proj.weight', 'base_model.model.model.layers.1.input_layernorm.weight', 'base_model.model.model.layers.1.post_attention_layernorm.weight', 'base_model.model.model.layers.2.self_attn.q_proj.weight', 'base_model.model.model.layers.2.self_attn.k_proj.weight', 'base_model.model.model.layers.2.self_attn.v_proj.weight', 'base_model.model.model.layers.2.self_attn.o_proj.weight', 'base_model.model.model.layers.2.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.2.mlp.gate_proj.weight', 'base_model.model.model.layers.2.mlp.down_proj.weight', 'base_model.model.model.layers.2.mlp.up_proj.weight', 'base_model.model.model.layers.2.input_layernorm.weight', 'base_model.model.model.layers.2.post_attention_layernorm.weight', 'base_model.model.model.layers.3.self_attn.q_proj.weight', 'base_model.model.model.layers.3.self_attn.k_proj.weight', 'base_model.model.model.layers.3.self_attn.v_proj.weight', 'base_model.model.model.layers.3.self_attn.o_proj.weight', 'base_model.model.model.layers.3.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.3.mlp.gate_proj.weight', 'base_model.model.model.layers.3.mlp.down_proj.weight', 'base_model.model.model.layers.3.mlp.up_proj.weight', 'base_model.model.model.layers.3.input_layernorm.weight', 'base_model.model.model.layers.3.post_attention_layernorm.weight', 'base_model.model.model.layers.4.self_attn.q_proj.weight', 'base_model.model.model.layers.4.self_attn.k_proj.weight', 'base_model.model.model.layers.4.self_attn.v_proj.weight', 'base_model.model.model.layers.4.self_attn.o_proj.weight', 'base_model.model.model.layers.4.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.4.mlp.gate_proj.weight', 'base_model.model.model.layers.4.mlp.down_proj.weight', 'base_model.model.model.layers.4.mlp.up_proj.weight', 'base_model.model.model.layers.4.input_layernorm.weight', 'base_model.model.model.layers.4.post_attention_layernorm.weight', 'base_model.model.model.layers.5.self_attn.q_proj.weight', 'base_model.model.model.layers.5.self_attn.k_proj.weight', 'base_model.model.model.layers.5.self_attn.v_proj.weight', 'base_model.model.model.layers.5.self_attn.o_proj.weight', 'base_model.model.model.layers.5.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.5.mlp.gate_proj.weight', 'base_model.model.model.layers.5.mlp.down_proj.weight', 'base_model.model.model.layers.5.mlp.up_proj.weight', 'base_model.model.model.layers.5.input_layernorm.weight', 'base_model.model.model.layers.5.post_attention_layernorm.weight', 'base_model.model.model.layers.6.self_attn.q_proj.weight', 'base_model.model.model.layers.6.self_attn.k_proj.weight', 'base_model.model.model.layers.6.self_attn.v_proj.weight', 'base_model.model.model.layers.6.self_attn.o_proj.weight', 'base_model.model.model.layers.6.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.6.mlp.gate_proj.weight', 'base_model.model.model.layers.6.mlp.down_proj.weight', 'base_model.model.model.layers.6.mlp.up_proj.weight', 'base_model.model.model.layers.6.input_layernorm.weight', 'base_model.model.model.layers.6.post_attention_layernorm.weight', 'base_model.model.model.layers.7.self_attn.q_proj.weight', 'base_model.model.model.layers.7.self_attn.k_proj.weight', 'base_model.model.model.layers.7.self_attn.v_proj.weight', 'base_model.model.model.layers.7.self_attn.o_proj.weight', 'base_model.model.model.layers.7.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.7.mlp.gate_proj.weight', 'base_model.model.model.layers.7.mlp.down_proj.weight', 'base_model.model.model.layers.7.mlp.up_proj.weight', 'base_model.model.model.layers.7.input_layernorm.weight', 'base_model.model.model.layers.7.post_attention_layernorm.weight', 'base_model.model.model.layers.8.self_attn.q_proj.weight', 'base_model.model.model.layers.8.self_attn.k_proj.weight', 'base_model.model.model.layers.8.self_attn.v_proj.weight', 'base_model.model.model.layers.8.self_attn.o_proj.weight', 'base_model.model.model.layers.8.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.8.mlp.gate_proj.weight', 'base_model.model.model.layers.8.mlp.down_proj.weight', 'base_model.model.model.layers.8.mlp.up_proj.weight', 'base_model.model.model.layers.8.input_layernorm.weight', 'base_model.model.model.layers.8.post_attention_layernorm.weight', 'base_model.model.model.layers.9.self_attn.q_proj.weight', 'base_model.model.model.layers.9.self_attn.k_proj.weight', 'base_model.model.model.layers.9.self_attn.v_proj.weight', 'base_model.model.model.layers.9.self_attn.o_proj.weight', 'base_model.model.model.layers.9.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.9.mlp.gate_proj.weight', 'base_model.model.model.layers.9.mlp.down_proj.weight', 'base_model.model.model.layers.9.mlp.up_proj.weight', 'base_model.model.model.layers.9.input_layernorm.weight', 'base_model.model.model.layers.9.post_attention_layernorm.weight', 'base_model.model.model.layers.10.self_attn.q_proj.weight', 'base_model.model.model.layers.10.self_attn.k_proj.weight', 'base_model.model.model.layers.10.self_attn.v_proj.weight', 'base_model.model.model.layers.10.self_attn.o_proj.weight', 'base_model.model.model.layers.10.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.10.mlp.gate_proj.weight', 'base_model.model.model.layers.10.mlp.down_proj.weight', 'base_model.model.model.layers.10.mlp.up_proj.weight', 'base_model.model.model.layers.10.input_layernorm.weight', 'base_model.model.model.layers.10.post_attention_layernorm.weight', 'base_model.model.model.layers.11.self_attn.q_proj.weight', 'base_model.model.model.layers.11.self_attn.k_proj.weight', 'base_model.model.model.layers.11.self_attn.v_proj.weight', 'base_model.model.model.layers.11.self_attn.o_proj.weight', 'base_model.model.model.layers.11.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.11.mlp.gate_proj.weight', 'base_model.model.model.layers.11.mlp.down_proj.weight', 'base_model.model.model.layers.11.mlp.up_proj.weight', 'base_model.model.model.layers.11.input_layernorm.weight', 'base_model.model.model.layers.11.post_attention_layernorm.weight', 'base_model.model.model.layers.12.self_attn.q_proj.weight', 'base_model.model.model.layers.12.self_attn.k_proj.weight', 'base_model.model.model.layers.12.self_attn.v_proj.weight', 'base_model.model.model.layers.12.self_attn.o_proj.weight', 'base_model.model.model.layers.12.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.12.mlp.gate_proj.weight', 'base_model.model.model.layers.12.mlp.down_proj.weight', 'base_model.model.model.layers.12.mlp.up_proj.weight', 'base_model.model.model.layers.12.input_layernorm.weight', 'base_model.model.model.layers.12.post_attention_layernorm.weight', 'base_model.model.model.layers.13.self_attn.q_proj.weight', 'base_model.model.model.layers.13.self_attn.k_proj.weight', 'base_model.model.model.layers.13.self_attn.v_proj.weight', 'base_model.model.model.layers.13.self_attn.o_proj.weight', 'base_model.model.model.layers.13.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.13.mlp.gate_proj.weight', 'base_model.model.model.layers.13.mlp.down_proj.weight', 'base_model.model.model.layers.13.mlp.up_proj.weight', 'base_model.model.model.layers.13.input_layernorm.weight', 'base_model.model.model.layers.13.post_attention_layernorm.weight', 'base_model.model.model.layers.14.self_attn.q_proj.weight', 'base_model.model.model.layers.14.self_attn.k_proj.weight', 'base_model.model.model.layers.14.self_attn.v_proj.weight', 'base_model.model.model.layers.14.self_attn.o_proj.weight', 'base_model.model.model.layers.14.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.14.mlp.gate_proj.weight', 'base_model.model.model.layers.14.mlp.down_proj.weight', 'base_model.model.model.layers.14.mlp.up_proj.weight', 'base_model.model.model.layers.14.input_layernorm.weight', 'base_model.model.model.layers.14.post_attention_layernorm.weight', 'base_model.model.model.layers.15.self_attn.q_proj.weight', 'base_model.model.model.layers.15.self_attn.k_proj.weight', 'base_model.model.model.layers.15.self_attn.v_proj.weight', 'base_model.model.model.layers.15.self_attn.o_proj.weight', 'base_model.model.model.layers.15.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.15.mlp.gate_proj.weight', 'base_model.model.model.layers.15.mlp.down_proj.weight', 'base_model.model.model.layers.15.mlp.up_proj.weight', 'base_model.model.model.layers.15.input_layernorm.weight', 'base_model.model.model.layers.15.post_attention_layernorm.weight', 'base_model.model.model.layers.16.self_attn.q_proj.weight', 'base_model.model.model.layers.16.self_attn.k_proj.weight', 'base_model.model.model.layers.16.self_attn.v_proj.weight', 'base_model.model.model.layers.16.self_attn.o_proj.weight', 'base_model.model.model.layers.16.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.16.mlp.gate_proj.weight', 'base_model.model.model.layers.16.mlp.down_proj.weight', 'base_model.model.model.layers.16.mlp.up_proj.weight', 'base_model.model.model.layers.16.input_layernorm.weight', 'base_model.model.model.layers.16.post_attention_layernorm.weight', 'base_model.model.model.layers.17.self_attn.q_proj.weight', 'base_model.model.model.layers.17.self_attn.k_proj.weight', 'base_model.model.model.layers.17.self_attn.v_proj.weight', 'base_model.model.model.layers.17.self_attn.o_proj.weight', 'base_model.model.model.layers.17.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.17.mlp.gate_proj.weight', 'base_model.model.model.layers.17.mlp.down_proj.weight', 'base_model.model.model.layers.17.mlp.up_proj.weight', 'base_model.model.model.layers.17.input_layernorm.weight', 'base_model.model.model.layers.17.post_attention_layernorm.weight', 'base_model.model.model.layers.18.self_attn.q_proj.weight', 'base_model.model.model.layers.18.self_attn.k_proj.weight', 'base_model.model.model.layers.18.self_attn.v_proj.weight', 'base_model.model.model.layers.18.self_attn.o_proj.weight', 'base_model.model.model.layers.18.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.18.mlp.gate_proj.weight', 'base_model.model.model.layers.18.mlp.down_proj.weight', 'base_model.model.model.layers.18.mlp.up_proj.weight', 'base_model.model.model.layers.18.input_layernorm.weight', 'base_model.model.model.layers.18.post_attention_layernorm.weight', 'base_model.model.model.layers.19.self_attn.q_proj.weight', 'base_model.model.model.layers.19.self_attn.k_proj.weight', 'base_model.model.model.layers.19.self_attn.v_proj.weight', 'base_model.model.model.layers.19.self_attn.o_proj.weight', 'base_model.model.model.layers.19.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.19.mlp.gate_proj.weight', 'base_model.model.model.layers.19.mlp.down_proj.weight', 'base_model.model.model.layers.19.mlp.up_proj.weight', 'base_model.model.model.layers.19.input_layernorm.weight', 'base_model.model.model.layers.19.post_attention_layernorm.weight', 'base_model.model.model.layers.20.self_attn.q_proj.weight', 'base_model.model.model.layers.20.self_attn.k_proj.weight', 'base_model.model.model.layers.20.self_attn.v_proj.weight', 'base_model.model.model.layers.20.self_attn.o_proj.weight', 'base_model.model.model.layers.20.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.20.mlp.gate_proj.weight', 'base_model.model.model.layers.20.mlp.down_proj.weight', 'base_model.model.model.layers.20.mlp.up_proj.weight', 'base_model.model.model.layers.20.input_layernorm.weight', 'base_model.model.model.layers.20.post_attention_layernorm.weight', 'base_model.model.model.layers.21.self_attn.q_proj.weight', 'base_model.model.model.layers.21.self_attn.k_proj.weight', 'base_model.model.model.layers.21.self_attn.v_proj.weight', 'base_model.model.model.layers.21.self_attn.o_proj.weight', 'base_model.model.model.layers.21.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.21.mlp.gate_proj.weight', 'base_model.model.model.layers.21.mlp.down_proj.weight', 'base_model.model.model.layers.21.mlp.up_proj.weight', 'base_model.model.model.layers.21.input_layernorm.weight', 'base_model.model.model.layers.21.post_attention_layernorm.weight', 'base_model.model.model.layers.22.self_attn.q_proj.weight', 'base_model.model.model.layers.22.self_attn.k_proj.weight', 'base_model.model.model.layers.22.self_attn.v_proj.weight', 'base_model.model.model.layers.22.self_attn.o_proj.weight', 'base_model.model.model.layers.22.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.22.mlp.gate_proj.weight', 'base_model.model.model.layers.22.mlp.down_proj.weight', 'base_model.model.model.layers.22.mlp.up_proj.weight', 'base_model.model.model.layers.22.input_layernorm.weight', 'base_model.model.model.layers.22.post_attention_layernorm.weight', 'base_model.model.model.layers.23.self_attn.q_proj.weight', 'base_model.model.model.layers.23.self_attn.k_proj.weight', 'base_model.model.model.layers.23.self_attn.v_proj.weight', 'base_model.model.model.layers.23.self_attn.o_proj.weight', 'base_model.model.model.layers.23.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.23.mlp.gate_proj.weight', 'base_model.model.model.layers.23.mlp.down_proj.weight', 'base_model.model.model.layers.23.mlp.up_proj.weight', 'base_model.model.model.layers.23.input_layernorm.weight', 'base_model.model.model.layers.23.post_attention_layernorm.weight', 'base_model.model.model.layers.24.self_attn.q_proj.weight', 'base_model.model.model.layers.24.self_attn.k_proj.weight', 'base_model.model.model.layers.24.self_attn.v_proj.weight', 'base_model.model.model.layers.24.self_attn.o_proj.weight', 'base_model.model.model.layers.24.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.24.mlp.gate_proj.weight', 'base_model.model.model.layers.24.mlp.down_proj.weight', 'base_model.model.model.layers.24.mlp.up_proj.weight', 'base_model.model.model.layers.24.input_layernorm.weight', 'base_model.model.model.layers.24.post_attention_layernorm.weight', 'base_model.model.model.layers.25.self_attn.q_proj.weight', 'base_model.model.model.layers.25.self_attn.k_proj.weight', 'base_model.model.model.layers.25.self_attn.v_proj.weight', 'base_model.model.model.layers.25.self_attn.o_proj.weight', 'base_model.model.model.layers.25.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.25.mlp.gate_proj.weight', 'base_model.model.model.layers.25.mlp.down_proj.weight', 'base_model.model.model.layers.25.mlp.up_proj.weight', 'base_model.model.model.layers.25.input_layernorm.weight', 'base_model.model.model.layers.25.post_attention_layernorm.weight', 'base_model.model.model.layers.26.self_attn.q_proj.weight', 'base_model.model.model.layers.26.self_attn.k_proj.weight', 'base_model.model.model.layers.26.self_attn.v_proj.weight', 'base_model.model.model.layers.26.self_attn.o_proj.weight', 'base_model.model.model.layers.26.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.26.mlp.gate_proj.weight', 'base_model.model.model.layers.26.mlp.down_proj.weight', 'base_model.model.model.layers.26.mlp.up_proj.weight', 'base_model.model.model.layers.26.input_layernorm.weight', 'base_model.model.model.layers.26.post_attention_layernorm.weight', 'base_model.model.model.layers.27.self_attn.q_proj.weight', 'base_model.model.model.layers.27.self_attn.k_proj.weight', 'base_model.model.model.layers.27.self_attn.v_proj.weight', 'base_model.model.model.layers.27.self_attn.o_proj.weight', 'base_model.model.model.layers.27.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.27.mlp.gate_proj.weight', 'base_model.model.model.layers.27.mlp.down_proj.weight', 'base_model.model.model.layers.27.mlp.up_proj.weight', 'base_model.model.model.layers.27.input_layernorm.weight', 'base_model.model.model.layers.27.post_attention_layernorm.weight', 'base_model.model.model.layers.28.self_attn.q_proj.weight', 'base_model.model.model.layers.28.self_attn.k_proj.weight', 'base_model.model.model.layers.28.self_attn.v_proj.weight', 'base_model.model.model.layers.28.self_attn.o_proj.weight', 'base_model.model.model.layers.28.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.28.mlp.gate_proj.weight', 'base_model.model.model.layers.28.mlp.down_proj.weight', 'base_model.model.model.layers.28.mlp.up_proj.weight', 'base_model.model.model.layers.28.input_layernorm.weight', 'base_model.model.model.layers.28.post_attention_layernorm.weight', 'base_model.model.model.layers.29.self_attn.q_proj.weight', 'base_model.model.model.layers.29.self_attn.k_proj.weight', 'base_model.model.model.layers.29.self_attn.v_proj.weight', 'base_model.model.model.layers.29.self_attn.o_proj.weight', 'base_model.model.model.layers.29.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.29.mlp.gate_proj.weight', 'base_model.model.model.layers.29.mlp.down_proj.weight', 'base_model.model.model.layers.29.mlp.up_proj.weight', 'base_model.model.model.layers.29.input_layernorm.weight', 'base_model.model.model.layers.29.post_attention_layernorm.weight', 'base_model.model.model.layers.30.self_attn.q_proj.weight', 'base_model.model.model.layers.30.self_attn.k_proj.weight', 'base_model.model.model.layers.30.self_attn.v_proj.weight', 'base_model.model.model.layers.30.self_attn.o_proj.weight', 'base_model.model.model.layers.30.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.30.mlp.gate_proj.weight', 'base_model.model.model.layers.30.mlp.down_proj.weight', 'base_model.model.model.layers.30.mlp.up_proj.weight', 'base_model.model.model.layers.30.input_layernorm.weight', 'base_model.model.model.layers.30.post_attention_layernorm.weight', 'base_model.model.model.layers.31.self_attn.q_proj.weight', 'base_model.model.model.layers.31.self_attn.k_proj.weight', 'base_model.model.model.layers.31.self_attn.v_proj.weight', 'base_model.model.model.layers.31.self_attn.o_proj.weight', 'base_model.model.model.layers.31.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.31.mlp.gate_proj.weight', 'base_model.model.model.layers.31.mlp.down_proj.weight', 'base_model.model.model.layers.31.mlp.up_proj.weight', 'base_model.model.model.layers.31.input_layernorm.weight', 'base_model.model.model.layers.31.post_attention_layernorm.weight', 'base_model.model.model.norm.weight', 'base_model.model.lm_head.0.weight']. {'train_runtime': 32629.1971, 'train_samples_per_second': 1.621, 'train_steps_per_second': 0.013, 'train_loss': 1.7019490381566489, 'epoch': 9.91} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 410/410 [9:03:49<00:00, 79.58s/it]

If there's a warning about missing keys above, please disregard :)

Tungsong avatar Apr 17 '23 11:04 Tungsong

micro_batch_size 太大学习不细,数据要做cosine去相似度,不然陷入循环问什么都答一样。最关键loss不够,一般0.5以下也达到可以业界稍微能使用的程度,0.7-0.8部分答案稍微合理。目前你train稳定下降,val缓慢,很可能是数据有问题。 预模型 7B已经很小了,你用lora练复杂问题,难度更高。可以找点简单的中文数据集练基本问题,再上专业问答。但你目前题目的难度用30B训练是可行的。

lywinged avatar Apr 17 '23 11:04 lywinged

micro_batch_size 太大学习不细,数据要做cosine去相似度,不然陷入循环问什么都答一样。最关键loss不够,一般0.5以下也达到可以业界稍微能使用的程度,0.7-0.8部分答案稍微合理。目前你train稳定下降,val缓慢,很可能是数据有问题。 预模型 7B已经很小了,你用lora练复杂问题,难度更高。可以找点简单的中文数据集练基本问题,再上专业问答。但你目前题目的难度用30B训练是可行的。

好的,谢谢你的回复,我按你的建议去调整再试一下

Tungsong avatar Apr 17 '23 11:04 Tungsong

micro_batch_size 太大学习不细,数据要做cosine去相似度,不然陷入循环问什么都答一样。最关键loss不够,一般0.5以下也达到可以业界稍微能使用的程度,0.7-0.8部分答案稍微合理。目前你train稳定下降,val缓慢,很可能是数据有问题。 预模型 7B已经很小了,你用lora练复杂问题,难度更高。可以找点简单的中文数据集练基本问题,再上专业问答。但你目前题目的难度用30B训练是可行的。

好的,谢谢你的回复,我按你的建议去调整再试一下

你目前这路线其实很不实际,应该用LangChain+Pipecone 处理专业回复,因为专业性回复还是需要调用绝对正确的资料,并且给出回答的文档出处。

lywinged avatar Apr 17 '23 11:04 lywinged

@lywinged 是的,我们目前已经有一个基于bert、ERNIE的对话问答系统,但仅限于简单的提问,对于比较复杂的问题小模型处理不了,所以在看能否通过LLM去学习新知识的方式来给解决复杂问题提供一些参考。我的初衷是让模型去学习文档中的知识之后对于用户的提问能给出参考性的回答,对此您还有什么建议吗

Tungsong avatar Apr 17 '23 11:04 Tungsong

@lywinged 是的,我们目前已经有一个基于bert、ERNIE的对话问答系统,但仅限于简单的提问,对于比较复杂的问题小模型处理不了,所以在看能否通过LLM去学习新知识的方式来给解决复杂问题提供一些参考。我的初衷是让模型去学习文档中的知识之后对于用户的提问能给出参考性的回答,对此您还有什么建议吗

要看企业规模,如果非要上50B以下LLM, 最新的办法也是LLM+LangChain+Pinecone, LLM只需要理解用户大概要问什么,接着调用专业资料再回答。

原理是LLM有私人数据库,接口得到问题后,相似度匹配从数据库吸取上下文再进入input,就可以回答看似简单一句话但很复杂的问题, 在回答的时候,答案也和数据库相似度匹配,再把对应的出处给用户。

lywinged avatar Apr 17 '23 12:04 lywinged

@lywinged 是的,我们目前已经有一个基于bert、ERNIE的对话问答系统,但仅限于简单的提问,对于比较复杂的问题小模型处理不了,所以在看能否通过LLM去学习新知识的方式来给解决复杂问题提供一些参考。我的初衷是让模型去学习文档中的知识之后对于用户的提问能给出参考性的回答,对此您还有什么建议吗

要看企业规模,如果非要上50B以下LLM, 最新的办法也是LLM+LangChain+Pinecone, LLM只需要理解用户大概要问什么,接着调用专业资料再回答。

原理是LLM有私人数据库,接口得到问题后,相似度匹配从数据库吸取上下文再进入input,就可以回答看似简单一句话但很复杂的问题, 在回答的时候,答案也和数据库相似度匹配,再把对应的出处给用户。

非常感谢您的建议

Tungsong avatar Apr 17 '23 12:04 Tungsong

Finally i figure out my problem where inference! the saved model file "adapter_model.bin" is not valid,with small size,i replace it with other bin file in the last checkpint directory,the inference process seems right.

the final model saved by "model.save_pretrained".

Hi,nkjulia,

May I ask: what do you mean --"adapter_model.bin" is not valid,with small size ? Does it mean you increase epoch number and adapter_model.bin will be large ? I am following this website (https://www.mlexpert.io/machine-learning/tutorials/alpaca-fine-tuning#data) exactly to fine-tune same data , but my model adapter_model.bin is only 400 bytes, the original author's adapter_model.bin is 16MB.

When I do generation using my fine-tuned model, there is no response, just keep repeating the same input. But if I use the author's finetuned ('curiousily/alpaca-bitcoin-tweets-sentiment'), I got correct sentiment classification output.

Can you please help to point out my fine-tuning issue and provide some guidance/suggestion? Also Could you please share ur training code?

Thanks a lot in advance!

xpang-sf avatar Jun 11 '23 20:06 xpang-sf

@Tungsong 您好。想问下欠拟合的问题解决了吗?我也有相同的问题

LawlightXY avatar Jun 29 '23 13:06 LawlightXY

——

Finally i figure out my problem where inference! the saved model file "adapter_model.bin" is not valid,with small size,i replace it with other bin file in the last checkpint directory,the inference process seems right. the final model saved by "model.save_pretrained".

Hi,nkjulia,

May I ask: what do you mean --"adapter_model.bin" is not valid,with small size ? Does it mean you increase epoch number and adapter_model.bin will be large ? I am following this website (https://www.mlexpert.io/machine-learning/tutorials/alpaca-fine-tuning#data) exactly to fine-tune same data , but my model adapter_model.bin is only 400 bytes, the original author's adapter_model.bin is 16MB.

When I do generation using my fine-tuned model, there is no response, just keep repeating the same input. But if I use the author's finetuned ('curiousily/alpaca-bitcoin-tweets-sentiment'), I got correct sentiment classification output.

Can you please help to point out my fine-tuning issue and provide some guidance/suggestion? Also Could you please share ur training code?

Thanks a lot in advance!

你这个是因为peft的一个bug导致的adapter_model 是空的,用 pytorch_mode.bin 替换adapter_model.bin

nkjulia avatar Jul 26 '23 06:07 nkjulia