LLaMA-Factory
LLaMA-Factory copied to clipboard
Qwen: Deepspeed(Zero3) + DPO error
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
keep failing before starting training dpo... (target model: Qwen-14b-chat
)
what does assert len(set(t.dtype for t in tensors)) == 1
meaning?
deepspeed config (0.14.0v)
{
"bf16": {
"enabled": "auto"
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": true
},
"train_batch_size": "auto",
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_micro_batch_size_per_gpu": "auto",
"steps_per_print": 1,
"wall_clock_breakdown": false
}
Error Message
[2024-03-10 18:29:22,849] [INFO] [utils.py:800:see_memory_usage] Stage 3 initialize beginning
[2024-03-10 18:29:22,850] [INFO] [utils.py:801:see_memory_usage] MA 8.48 GB Max_MA 11.39 GB CA 33.85 GB Max_CA 34 GB
[2024-03-10 18:29:22,850] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 42.89 GB, percent = 4.3%
[2024-03-10 18:29:22,852] [INFO] [stage3.py:130:__init__] Reduce bucket size 500,000,000
[2024-03-10 18:29:22,852] [INFO] [stage3.py:131:__init__] Prefetch bucket size 50,000,000
[2024-03-10 18:29:23,060] [INFO] [utils.py:800:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2024-03-10 18:29:23,061] [INFO] [utils.py:801:see_memory_usage] MA 8.48 GB Max_MA 8.48 GB CA 33.85 GB Max_CA 34 GB
[2024-03-10 18:29:23,061] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 42.89 GB, percent = 4.3%
Parameter Offload: Total persistent parameters: 1029120 in 201 params
[2024-03-10 18:29:23,295] [INFO] [utils.py:800:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2024-03-10 18:29:23,296] [INFO] [utils.py:801:see_memory_usage] MA 8.48 GB Max_MA 8.48 GB CA 33.85 GB Max_CA 34 GB
[2024-03-10 18:29:23,297] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 42.89 GB, percent = 4.3%
[2024-03-10 18:29:23,505] [INFO] [utils.py:800:see_memory_usage] Before creating fp16 partitions
[2024-03-10 18:29:23,506] [INFO] [utils.py:801:see_memory_usage] MA 8.48 GB Max_MA 8.48 GB CA 33.85 GB Max_CA 34 GB
[2024-03-10 18:29:23,507] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 42.89 GB, percent = 4.3%
Traceback (most recent call last):
File "/myenv/lib/python3.10/site-packages/llmtuner/train/tuner.py", line 37, in run_exp
run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
File "/myenv/lib/python3.10/site-packages/llmtuner/train/dpo/workflow.py", line 50, in run_dpo
trainer = CustomDPOTrainer(
File "/myenv/lib/python3.10/site-packages/llmtuner/train/dpo/trainer.py", line 60, in __init__
self.ref_model = self._prepare_deepspeed(self.ref_model)
File "/myenv/lib/python3.10/site-packages/trl/trainer/dpo_trainer.py", line 447, in _prepare_deepspeed
model, *_ = deepspeed.initialize(model=model, config=config_kwargs)
File "/myenv/lib/python3.10/site-packages/deepspeed/__init__.py", line 176, in initialize
engine = DeepSpeedEngine(args=args,
File "/myenv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in __init__
self._configure_optimizer(optimizer, model_parameters)
File "/myenv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1256, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/myenv/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1579, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer_Stage3(
File "/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 317, in __init__
self._create_fp16_partitions_with_defragmentation(self.trainable_param_groups)
File "/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 697, in _create_fp16_partitions_with_defragmentation
device_buffer = __class__.defragment(parameter_partitions)
File "/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 529, in defragment
assert len(set(t.dtype for t in tensors)) == 1
AssertionError
Expected behavior
No response
System Info
- python==3.10
- llmtuner==0.5.3
- deepspeed==0.14.0
- transformers==4.38.2
- cuda==12.3
- hardware: A100 80gb 8ea
Others
No response
Training config
{
"stage": "dpo",
"do_train": true,
"model_name_or_path": "(removed)",
"dataset": "(removed)",
"dataset_dir": "../data",
"template": "qwen",
"finetuning_type": "full",
"output_dir": "(removed)",
"overwrite_cache": true,
"overwrite_output_dir": true,
"cutoff_len": 4096,
"val_size": 0,
"evaluation_strategy": "no",
"fp16_full_eval": true,
"per_device_eval_batch_size": 4,
"eval_accumulation_steps": 4,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 16,
"gradient_checkpointing": true,
"save_only_model": true,
"save_safetensors": true,
"lr_scheduler_type": "cosine",
"logging_steps": 1,
"save_strategy": "epoch",
"learning_rate": 1e-6,
"save_total_limit": 1,
"num_train_epochs": 1,
"warmup_ratio": 0.05,
"weight_decay": 0.01,
"plot_loss": true,
"accelerator_config": {
"dispatch_batches": false
},
"use_fast_tokenizer": true,
"resume_from_checkpoint": false,
"report_to": "wandb",
"deepspeed": "ds_zero3_dpo.json",
"bf16": true
}
I have the same problem with qwen1.5-14b-chat DPO training, have you solved it yet?
have you try? --fp16 True
stage3 config "fp16": { "enabled": true,
have you try? --fp16 True
stage3 config "fp16": { "enabled": true,
Hi, have you solved the issue? I have met the exact same problem.
Hi, have you solved the issue? I have met the exact same problem. py3.10 llamafactory 0.7.2.dev0 torch 2.3.0 transformers 4.37.2 / 4.41 deepspeed 0.13.0/0.14.0 cuda 12.1
tried:
- --bf16
- --fp16
Hi, have you solved the issue? I have met the exact same problem.
我也刚在尝试,发现只要用_prepare_deepspeed载入ref_model情况下,就会报错,开始zero2/zero3都尝试过 : (
Hi, have you solved the issue? I have met the exact same problem.
我也刚在尝试,发现只要用_prepare_deepspeed载入ref_model情况下,就会报错,开始zero2/zero3都尝试过 : (
我去翻了翻源码,好像是trl的问题,它用ref_model初始化deepspeed的时候,会出问题。而且我的理解是ref_model应该会freeze不做训练,他直接ref_model=model,也没有copy,然后直接用refmodel初始化deepspeed。 不行试一试量化ref模型到4bit或者8bit吧。直接跳过deepspeed
可能就是offload问题,我用stage3 no offoad就可以了。
deepspeed --master_port 25002 --include "localhost:4,5,6,7" src/train.py \
--model_name_or_path ${model_path} \
--stage 'dpo' \
--do_train \
--finetuning_type 'full' \
--dpo_ftx 1.0 \
--ddp_timeout 180000000 \
--deepspeed examples/deepspeed/ds_z3_config.json \
--dataset $dataset \
--template llama2 \
--cutoff_len 4096 \
--max_samples 10000000000 \
--overwrite_cache \
--output_dir xxxxx \
--logging_steps 10 \
--save_strategy 'no' \
--plot_loss \
--overwrite_output_dir true \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 8 \
--learning_rate 0.000005 \
--num_train_epochs 2.0 \
--lr_scheduler_type cosine \
--warmup_steps 500 \
--fp16 |tee ${name}.log
可能就是offload问题,我用stage3 no offoad就可以了。
deepspeed --master_port 25002 --include "localhost:4,5,6,7" src/train.py \ --model_name_or_path ${model_path} \ --stage 'dpo' \ --do_train \ --finetuning_type 'full' \ --dpo_ftx 1.0 \ --ddp_timeout 180000000 \ --deepspeed examples/deepspeed/ds_z3_config.json \ --dataset $dataset \ --template llama2 \ --cutoff_len 4096 \ --max_samples 10000000000 \ --overwrite_cache \ --output_dir xxxxx \ --logging_steps 10 \ --save_strategy 'no' \ --plot_loss \ --overwrite_output_dir true \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 8 \ --learning_rate 0.000005 \ --num_train_epochs 2.0 \ --lr_scheduler_type cosine \ --warmup_steps 500 \ --fp16 |tee ${name}.log
我原本就是使用的ds_z3_config配置,但我在入参中指定了ref_model的路径,然后ref_model经过_prepare_deepspeed初始化后就会报错,那你现在这样ref_model经过_prepare_deepspeed初始化后不会报错了吗? 注:上述实验我是在lora下进行的