Vision-SR1
Vision-SR1 copied to clipboard
Why format reward is always zero?
Thanks for you excellent work! I train RL for 20 steps using your cold-start model, but the fornat reward is always 0.