LLaMA-Factory Reward Model训练epoch出现回退

Reward Model训练epoch出现回退

Open huyufeng0407 opened this issue 3 months ago • 0 comments

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

启动命令 torchrun --nproc_per_node $NPROC_PER_NODE \ --nnodes $NNODES \ --node_rank $RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT \ ../../src/train_bash.py \ --deepspeed ds_z1_config.json \ --stage rm \ --do_train \ --model_name_or_path /root/model/CodeLlama-7b-hf/ \ --create_new_adapter \ --dataset codesftpreferv1 \ --dataset_dir ${DATA_DIR} \ --template default \ --finetuning_type full \ --output_dir ../../saves/LLaMA2-7B/full/sft4krm6w8kep5 \ --overwrite_cache \ --overwrite_output_dir \ --cutoff_len 8192 \ --preprocessing_num_workers 16 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 2 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --warmup_steps 20 \ --save_steps 250 \ --eval_steps 500 \ --evaluation_strategy steps \ --learning_rate 5e-5 \ --num_train_epochs 5.0 \ --max_samples 400000000 \ --val_size 0.001 \ --plot_loss \ --bf16 \ --save_total_limit 10 \ --report_to tensorboard

在8卡A800上训练reward model时，出现 1cae4321258759fd15e8b00ad

相关环境之前训练过SFT(4机8卡、单机8卡），均正常。

Expected behavior

No response

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.38.1
Platform: Linux-5.10.0-1.0.0.28-x86_64-with-glibc2.27
Python version: 3.9.18
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.2.1+cu118 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Others

No response

Mar 12 '24 08:03 huyufeng0407

LLaMA-Factory LLaMA-Factory copied to clipboard

Reward Model训练epoch出现回退

Reminder

Reproduction

Expected behavior

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard