fsdfbio

Results 2 issues of fsdfbio

Hi, thanks for sharing your code. From my understanding, the training ratio win is $p_\theta (x_{t-1}^w|c, t, x_t) / p_\{ref} (x_{t-1}^l|c, t, x_t)$. The training ratio win should increase during...

Hi, thank you for open-sourcing your excellent work. I noticed that the SFT tuned model performs slightly worse compared to the original model. I am trying to analyze/understand why the...