Why format reward is always zero?

Open amoreZgx1n opened this issue 1 month ago • 0 comments

Thanks for you excellent work! I train RL for 20 steps using your cold-start model, but the fornat reward is always 0.

Dec 01 '25 02:12 amoreZgx1n