Why compute IPO loss using `average_log_prob=Ture`?
Why compute IPO loss using average_log_prob=Ture
In function concatenated_forward,when the loss_type equals 'ipo' the parameter average_log_prob will be set True, but according to the loss formula of IPO, the length m as a factor will be eliminated
all_logps = self.get_batch_logps(
all_logits,
concatenated_batch["concatenated_labels"],
average_log_prob=self.loss_type == "ipo",
is_encoder_decoder=self.is_encoder_decoder,
label_pad_token_id=self.label_pad_token_id,
)
This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."
This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."
@QiyaoWei Please forgive me, I don't find any formula using average log-likelihood in paper
This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."
@QiyaoWei Please forgive me, I don't find any formula using average log-likelihood in paper
You can conduct practical experiments to ensure that the training is more stable and the results perform better. But now there is no need for average_log_probe=Ture, it has been placed outside!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.