MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

compute_loss中,batch size > 1时,直接按token粒度算平均loss是否合适呢,是否应该样本间求平均更好一点

Open yjc11 opened this issue 1 year ago • 1 comments

起始日期 | Start Date

8/14/2024

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

基本示例 | Basic Example

    if labels is not None:
        # Flatten the tokens
        loss_fct = nn.CrossEntropyLoss()
        logits = outputs.logits.view(-1, self.model.config.vocab_size).contiguous()
        labels = labels.view(-1).long().contiguous()
        # Enable model parallelism
        labels = labels.to(logits.device)
        loss = loss_fct(logits, labels)

缺陷 | Drawbacks

未解决问题 | Unresolved questions

No response

yjc11 avatar Aug 14 '24 03:08 yjc11

您好,我的理解是,token粒度熵计算交叉上更加平滑,也更加容易处理不同样本长度不一的问题

LDLINGLINGLING avatar Aug 14 '24 08:08 LDLINGLINGLING

您好,我的理解是,token粒度熵计算交叉上更加平滑,也更加容易处理不同样本长度不一的问题

感谢回复

yjc11 avatar Aug 16 '24 05:08 yjc11