trl Why compute IPO loss using `average_log

Why compute IPO loss using average_log_prob=Ture In function concatenated_forward，when the loss_type equals 'ipo' the parameter average_log_prob will be set True, but according to the loss formula of IPO, the length m as a factor will be eliminated

all_logps = self.get_batch_logps(
    all_logits,
    concatenated_batch["concatenated_labels"],
    average_log_prob=self.loss_type == "ipo",
    is_encoder_decoder=self.is_encoder_decoder,
    label_pad_token_id=self.label_pad_token_id,
)

May 28 '24 17:05 AIR-hl

This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."

Jun 01 '24 15:06 QiyaoWei

This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."

@QiyaoWei Please forgive me, I don't find any formula using average log-likelihood in paper

Jun 02 '24 08:06 AIR-hl

This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."

@QiyaoWei Please forgive me, I don't find any formula using average log-likelihood in paper

You can conduct practical experiments to ensure that the training is more stable and the results perform better. But now there is no need for average_log_probe=Ture, it has been placed outside!

Jun 07 '24 08:06 Trangle

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Jul 11 '24 15:07 github-actions[bot]

Why compute IPO loss using `average_log_prob=Ture`？