trl
trl copied to clipboard
How to use `predict` function in `DPOTrainer`
I want to get the logp and reward of the data through predict
, but the prediction seems only include one data.
What is the correct usage of predict
?