[Question] Why TR-DPO default alpha and tau don't match the values suggested in the paper?

Open qgallouedec opened this issue 1 year ago • 1 comments

(cc @syrn1k, author of #1593) In the paper, they seem to recommend α = 0.6, τ = 512

while in trl, we've α = 0.9, τ = 64

https://github.com/huggingface/trl/blob/10f70fa3337826ffb8c2e0eb0de00051ea53563b/trl/trainer/dpo_config.py#L143-L144

Aug 28 '24 16:08 qgallouedec

TODO:

If 1. is better, close the issue If 2. is better or if 1 and 2 are the same, change our default values.

Open to contributions!

Oct 20 '24 17:10 qgallouedec

I would like to help, could you assign this to me?

Dec 07 '24 05:12 Beichen-Ma

I am interested in trying this!

Feb 20 '25 22:02 Ishan-Kumar2