fairseq2
fairseq2 copied to clipboard
athene reward online dpo fix, speed up ray init
Previous online DPO fix did not consider that athene reward also mimic dummy batch although its likely not needed, but for now we copy the same logic as from math verify reward