direct-preference-optimization icon indicating copy to clipboard operation
direct-preference-optimization copied to clipboard

Question about average_log_prob

Open LSX-Sneakerprogrammer opened this issue 1 year ago • 9 comments

Hi, I see there is a bool variable in _get_batch_logps of trainers.py to control whether get the average log probability or not. And I have two questions.

  1. Did you do experiments on this to see which one performs better?
  2. If I choose to get average log probability, I consider the pad_to_length function needed to turn off. Is that right?

Hope you could help me on these questions, thanks a lot!

LSX-Sneakerprogrammer avatar Oct 24 '23 05:10 LSX-Sneakerprogrammer