direct-preference-optimization
direct-preference-optimization copied to clipboard
Question about average_log_prob
Hi, I see there is a bool variable in _get_batch_logps of trainers.py to control whether get the average log probability or not. And I have two questions.
- Did you do experiments on this to see which one performs better?
- If I choose to get average log probability, I consider the pad_to_length function needed to turn off. Is that right?
Hope you could help me on these questions, thanks a lot!