Baichuan-7B
Baichuan-7B copied to clipboard
实现了baichuan-7B模型的LoRA微调
支持Alpaca等指令数据集的SFT和RLHF流程:https://github.com/hiyouga/LLaMA-Efficient-Tuning
LoRA微调可在单块3090 GPU上运行,同时支持QLoRA方法。(最低12G显存)
微调模型的 LoRA 权重:https://huggingface.co/hiyouga/baichuan-7b-sft
运行以下指令即可实现 Alpaca 数据集指令微调(instruction-tuning):
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
--model_name_or_path baichuan-7B模型文件夹路径或huggingface地址 \
--do_train \
--dataset alpaca_gpt4_zh \
--finetuning_type lora \
--lora_rank 8 \
--lora_target W_pack \
--output_dir alpaca_baichuan \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 100 \
--eval_steps 100 \
--learning_rate 5e-5 \
--max_grad_norm 0.5 \
--num_train_epochs 3.0 \
--dev_ratio 0.01 \
--evaluation_strategy steps \
--load_best_model_at_end \
--plot_loss \
--fp16
程序运行截图示例:
经过LoRA指令微调后的对话效果:
牛逼,好快啊
牛逼
大佬太强了
支持Alpaca等指令数据集的SFT和RLHF流程:https://github.com/hiyouga/LLaMA-Efficient-Tuning
运行以下指令即可实现 Alpaca 数据集指令微调(instruction-tuning):
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \ --model_name_or_path baichuan-7B模型文件夹路径 \ --do_train \ --dataset alpaca_gpt4_zh \ --finetuning_type lora \ --lora_rank 8 \ --lora_target W_pack \ --output_dir alpaca_baichuan \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 100 \ --eval_steps 100 \ --learning_rate 5e-5 \ --max_grad_norm 0.5 \ --num_train_epochs 3.0 \ --dev_ratio 0.01 \ --evaluation_strategy steps \ --load_best_model_at_end \ --plot_loss \ --fp16程序运行截图示例:
有微调数据集格式吗?
@GalSang17 项目自带了,点进data文件夹就可以看示例格式。
@GalSang17 项目自带了,点进data文件夹就可以看示例格式。
谢谢!
赞👍🏻
@hiyouga 没有出现这个错误吗?
./aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [51,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [52,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [53,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [54,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [55,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [56,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [57,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [58,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [59,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
@bytes-lost 完整的报错信息是什么?哪一行代码导致的?
@hiyouga
[INFO|trainer.py:622] 2023-06-15 17:12:03,926 >> Using cuda_amp half precision backend
[INFO|trainer.py:1779] 2023-06-15 17:12:03,933 >> ***** Running training *****
[INFO|trainer.py:1780] 2023-06-15 17:12:03,934 >> Num examples = 48,329
[INFO|trainer.py:1781] 2023-06-15 17:12:03,934 >> Num Epochs = 3
[INFO|trainer.py:1782] 2023-06-15 17:12:03,934 >> Instantaneous batch size per device = 4
[INFO|trainer.py:1783] 2023-06-15 17:12:03,934 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1784] 2023-06-15 17:12:03,934 >> Gradient Accumulation steps = 8
[INFO|trainer.py:1785] 2023-06-15 17:12:03,934 >> Total optimization steps = 4,530
[INFO|trainer.py:1786] 2023-06-15 17:12:03,935 >> Number of trainable parameters = 4,194,304
0%| | 0/4530 [00:00<?, ?it/s]
0%| | 1/4530 [00:04<5:45:55, 4.58s/it]
0%| | 2/4530 [00:07<4:42:43, 3.75s/it]Traceback (most recent call last):
File "/mnt/data/user/LLaMA-Efficient-Tuning/src/train_sft.py", line 97, in <module>
main()
File "/mnt/data/user/LLaMA-Efficient-Tuning/src/train_sft.py", line 69, in main
train_result = trainer.train()
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2767, in compute_loss
outputs = model(**inputs)
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward
return self.base_model(
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7b/modeling_baichuan.py", line 617, in forward
outputs = self.model(
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7b/modeling_baichuan.py", line 501, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 89, in forward
ctx.fwd_gpu_devices, ctx.fwd_gpu_states = get_device_states(*args)
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 50, in get_device_states
fwd_gpu_states.append(torch.cuda.get_rng_state())
File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/cuda/random.py", line 31, in get_rng_state
return default_generator.get_state()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@bytes-lost 应该是数组越界了,我在加载 tokenizer 时手动将 pad_token_id 设置为了 0,检查一下你那边有没有设置。输入序列中不能有大于等于 64000 的值。
@hiyouga 我在train_sft.py这里加上了一行,但是还是一样的报错
model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="sft")
tokenizer.pad_token_id = 0 # 指定pad_token_id
dataset = preprocess_data(dataset, tokenizer, data_args, training_args, stage="sft")
@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题,可能和本地的 torch 以及 CUDA 环境有关,我这边测试了好几遍都没有问题。
@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题,可能和本地的 torch 以及 CUDA 环境有关,我这边测试了好几遍都没有问题。
好的,我重新创建环境测测看,torch=2.0.1版本是可以的吗?
我这边一直在自己对话,而且“你是谁”,也不是需要的答案,微调代码跟上面提供的一模一样的呢
@gebilaoman 用项目自带 cli_demo 启动时请添加 --prompt_template ziya 参数
好快的速度,好猛
我这边也实现了baichuan-7b 的lora微调,baichuan模型的结构跟llama一致,它的SFT微调方法跟bloom/llama基本一致的。
支持baichuan-7b微调项目地址:https://github.com/shibing624/MedicalGPT
该项目还实现了GPT模型训练,包括二次预训练、有监督微调、奖励建模、强化学习训练。
运行以下指令即可实现 belle 数据集指令微调(instruction-tuning):
python3 supervised_finetuning.py \
--model_type auto \
--model_name_or_path baichuan-inc/baichuan-7B \
--train_file_dir ./data/finetune \
--validation_file_dir ./data/finetune \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--do_train \
--do_eval \
--use_peft True \
--max_train_samples 1000 \
--max_eval_samples 10 \
--num_train_epochs 1 \
--learning_rate 2e-5 \
--warmup_ratio 0.05 \
--weight_decay 0.05 \
--logging_strategy steps \
--logging_steps 10 \
--eval_steps 50 \
--evaluation_strategy steps \
--save_steps 500 \
--save_strategy steps \
--save_total_limit 3 \
--gradient_accumulation_steps 1 \
--preprocessing_num_workers 1 \
--max_source_length 256 \
--max_target_length 256 \
--output_dir outputs-sft-baichuan-v1 \
--overwrite_output_dir \
--ddp_timeout 30000 \
--logging_first_step True \
--target_modules all \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout 0.05 \
--fp16 \
--torch_dtype float16 \
--device_map auto \
--report_to tensorboard \
--ddp_find_unused_parameters False \
--gradient_checkpointing True
运行过程截图(loss 稳定下降):
欢迎大家测试,验证效果。
@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题,可能和本地的 torch 以及 CUDA 环境有关,我这边测试了好几遍都没有问题。
好的,我重新创建环境测测看,torch=2.0.1版本是可以的吗?
我同样的问题tokenizer.pad_token_id = 0 之后就可以了
运行报这个错误,是要改模型里的config.json吗?
运行报这个错误,是要改模型里的config.json吗?
不是 ChatGLM 的代码,是 LLAMA 那一份。https://github.com/hiyouga/LLaMA-Efficient-Tuning
能实现多轮对话的微调吗,具体多轮对话的数据格式能不能演示一下谢谢
@usun1997 支持多轮对话,格式参考:https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/data/example_dataset/examples.json
@hiyouga 你好,项目自带 cli_demo 启动时,为什么要添加 --prompt_template ziya 参数? 为什么是ziya?不应该是baichuan吗
@cristianohello 因为我微调时候用的是 ziya 的 template😁 @usun1997 正确。
@hiyouga 你好,感谢回复。 又遇到连续自问自答的情况,如何解决?
@cristianohello 目前的 SFT 模型没有进行多轮对话训练,所以多轮时候偶尔会出现问题。
@usun1997 支持多轮对话,格式参考:https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/data/example_dataset/examples.json
多谢。我示范一下我对格式的理解,您看对不对。如果说我微调数据里只有一次对话话题,这次对话有三轮。
[ { "instruction": "我的最后一轮对话问题", "input": "", "output": "模型的最后一轮对话回答", "history": [ ["我的第一轮对话问题", "模型的第一轮对话回答"], ["我的第二轮对话问题", "模型的第二轮对话回答"] ] } ]
是不是说,如果在列表中的type为dict的对话数据的keys中存在history,意味着这个dict类型对话数据应该是多轮对话,然后它一开始的instruction, input和 output都代表的是最后一轮的问答,然后在history中,按index顺序排列对话顺序。
@hiyouga
我的情况是
输入你是谁问题,它就自问自答很多轮才结束,如何让他一问一答呢
@cristianohello 因为我微调时候用的是 ziya 的 template😁 @usun1997 正确。
好的感谢

运行报这个错误,是要改模型里的config.json吗?