trl icon indicating copy to clipboard operation
trl copied to clipboard

Why compute IPO loss using `average_log_prob=Ture`?

Open AIR-hl opened this issue 1 year ago • 2 comments

Why compute IPO loss using average_log_prob=Ture In function concatenated_forward,when the loss_type equals 'ipo' the parameter average_log_prob will be set True, but according to the loss formula of IPO, the length m as a factor will be eliminated

image

all_logps = self.get_batch_logps(
    all_logits,
    concatenated_batch["concatenated_labels"],
    average_log_prob=self.loss_type == "ipo",
    is_encoder_decoder=self.is_encoder_decoder,
    label_pad_token_id=self.label_pad_token_id,
)

AIR-hl avatar May 28 '24 17:05 AIR-hl

This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."

QiyaoWei avatar Jun 01 '24 15:06 QiyaoWei

This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."

@QiyaoWei Please forgive me, I don't find any formula using average log-likelihood in paper

AIR-hl avatar Jun 02 '24 08:06 AIR-hl

This has been discussed in multiple github issues, and I believe the answer stems from the discussion that Huggingface had with the IPO authors here "After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix..."

@QiyaoWei Please forgive me, I don't find any formula using average log-likelihood in paper

You can conduct practical experiments to ensure that the training is more stable and the results perform better. But now there is no need for average_log_probe=Ture, it has been placed outside!

Trangle avatar Jun 07 '24 08:06 Trangle

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Jul 11 '24 15:07 github-actions[bot]