direct-preference-optimization icon indicating copy to clipboard operation
direct-preference-optimization copied to clipboard

Why does SFT sum the cross-entropy loss within each sequence?

Open yunjae-won opened this issue 1 year ago • 3 comments

Thank you for maintaining such an important repository. I really enjoyed and learned a lot from reading your DPO paper.

I have one question regarding the SFT loss implementation in the repository. Apparently, the SFT loss sums the cross entropy loss within each sequences. However, from my understanding, language modeling loss conventionally averages the cross entropy loss for all tokens within the batch (Ref: GPT2 Loss). I think this results in a difference in computing the standard cross entropy loss between TRL's SFTTrainer and this repository's SFT loss. Why is SFT implemented this way?

yunjae-won avatar Feb 17 '24 07:02 yunjae-won

Same question here. Hi @YJWon99 , do you have any ideas now?

HuXiangkun avatar May 17 '24 06:05 HuXiangkun

has been solved? same question

yiyepiaoling0715 avatar Jan 03 '25 13:01 yiyepiaoling0715

@yiyepiaoling0715 I think it's a bug in their code, it should be averaged over the sequence and I made the revision in my experiments.

HuXiangkun avatar Jan 06 '25 01:01 HuXiangkun