Vision-SR1 icon indicating copy to clipboard operation
Vision-SR1 copied to clipboard

Why format reward is always zero?

Open amoreZgx1n opened this issue 1 month ago • 0 comments

Thanks for you excellent work! I train RL for 20 steps using your cold-start model, but the fornat reward is always 0.

amoreZgx1n avatar Dec 01 '25 02:12 amoreZgx1n