stainless-steel-rat

Results 1 issues of stainless-steel-rat

hi, I would like to use both a reward model and a reward function simultaneously during PPO training. Does support such a hybrid reward? If so, could you provide an...