trl
trl copied to clipboard
[Question] Why TR-DPO default alpha and tau don't match the values suggested in the paper?
(cc @syrn1k, author of #1593) In the paper, they seem to recommend α = 0.6, τ = 512
while in trl, we've α = 0.9, τ = 64
https://github.com/huggingface/trl/blob/10f70fa3337826ffb8c2e0eb0de00051ea53563b/trl/trainer/dpo_config.py#L143-L144
TODO:
- Run an experiment with our current default values
- Run the same experiment with the papers default values
If 1. is better, close the issue If 2. is better or if 1 and 2 are the same, change our default values.
Open to contributions!
I would like to help, could you assign this to me?
I am interested in trying this!