LLaMA-Factory
LLaMA-Factory copied to clipboard
Reward Model训练epoch出现回退
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
启动命令
torchrun --nproc_per_node $NPROC_PER_NODE \ --nnodes $NNODES \ --node_rank $RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT \ ../../src/train_bash.py \ --deepspeed ds_z1_config.json \ --stage rm \ --do_train \ --model_name_or_path /root/model/CodeLlama-7b-hf/ \ --create_new_adapter \ --dataset codesftpreferv1 \ --dataset_dir ${DATA_DIR} \ --template default \ --finetuning_type full \ --output_dir ../../saves/LLaMA2-7B/full/sft4krm6w8kep5 \ --overwrite_cache \ --overwrite_output_dir \ --cutoff_len 8192 \ --preprocessing_num_workers 16 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 2 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --warmup_steps 20 \ --save_steps 250 \ --eval_steps 500 \ --evaluation_strategy steps \ --learning_rate 5e-5 \ --num_train_epochs 5.0 \ --max_samples 400000000 \ --val_size 0.001 \ --plot_loss \ --bf16 \ --save_total_limit 10 \ --report_to tensorboard
在8卡A800上训练reward model时,出现
相关环境之前训练过SFT(4机8卡、单机8卡),均正常。
Expected behavior
No response
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
-
transformers
version: 4.38.1 - Platform: Linux-5.10.0-1.0.0.28-x86_64-with-glibc2.27
- Python version: 3.9.18
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.1+cu118 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Others
No response