Samwise

Results 3 issues of Samwise

I was using script from step3_rlhf_finetuning/training_scripts/single_node/run_6.7b.sh, I met some errors. I used 7B Llama models as actor and critic respectively and set enable_hybrid_engine argument, I got errors like below: │...

Duiring DPO training for some datasets, chosen rewards recorded in logger(wandb, tensorboard etc) are always negative. Is it normal? Why did these circumstances happend?