trl icon indicating copy to clipboard operation
trl copied to clipboard

[Question] Why TR-DPO default alpha and tau don't match the values suggested in the paper?

Open qgallouedec opened this issue 1 year ago • 1 comments

(cc @syrn1k, author of #1593) In the paper, they seem to recommend α = 0.6, τ = 512

Screenshot 2024-08-28 at 17 58 11 Screenshot 2024-08-28 at 17 58 29

while in trl, we've α = 0.9, τ = 64

https://github.com/huggingface/trl/blob/10f70fa3337826ffb8c2e0eb0de00051ea53563b/trl/trainer/dpo_config.py#L143-L144

qgallouedec avatar Aug 28 '24 16:08 qgallouedec

TODO:

  1. Run an experiment with our current default values
  2. Run the same experiment with the papers default values

If 1. is better, close the issue If 2. is better or if 1 and 2 are the same, change our default values.

Open to contributions!

qgallouedec avatar Oct 20 '24 17:10 qgallouedec

I would like to help, could you assign this to me?

Beichen-Ma avatar Dec 07 '24 05:12 Beichen-Ma

I am interested in trying this!

Ishan-Kumar2 avatar Feb 20 '25 22:02 Ishan-Kumar2