Samwise
Results
3
issues of
Samwise
I was using script from step3_rlhf_finetuning/training_scripts/single_node/run_6.7b.sh, I met some errors. I used 7B Llama models as actor and critic respectively and set enable_hybrid_engine argument, I got errors like below: │...
Duiring DPO training for some datasets, chosen rewards recorded in logger(wandb, tensorboard etc) are always negative. Is it normal? Why did these circumstances happend?