Samwise issues

Repositories
Issues
Comments

Results 3 issues of


                                            Samwise

When running Stage-3 scripts with enable_hybrid_engine encountered errors

I was using script from step3_rlhf_finetuning/training_scripts/single_node/run_6.7b.sh, I met some errors. I used 7B Llama models as actor and critic respectively and set enable_hybrid_engine argument, I got errors like below: │...

Why sometimes chosen_rewards become negaive?

Duiring DPO training for some datasets, chosen rewards recorded in logger(wandb, tensorboard etc) are always negative. Is it normal? Why did these circumstances happend?

Samwise

When running Stage-3 scripts with enable_hybrid_engine encountered errors

Why sometimes chosen_rewards become negaive?

Nothing