fairseq2 icon indicating copy to clipboard operation
fairseq2 copied to clipboard

athene reward online dpo fix, speed up ray init

Open uralik opened this issue 5 months ago • 0 comments

Previous online DPO fix did not consider that athene reward also mimic dummy batch although its likely not needed, but for now we copy the same logic as from math verify reward

uralik avatar Sep 20 '25 02:09 uralik