MiniCPM-V
MiniCPM-V copied to clipboard
compute_loss中,batch size > 1时,直接按token粒度算平均loss是否合适呢,是否应该样本间求平均更好一点
起始日期 | Start Date
8/14/2024
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
基本示例 | Basic Example
if labels is not None:
# Flatten the tokens
loss_fct = nn.CrossEntropyLoss()
logits = outputs.logits.view(-1, self.model.config.vocab_size).contiguous()
labels = labels.view(-1).long().contiguous()
# Enable model parallelism
labels = labels.to(logits.device)
loss = loss_fct(logits, labels)
缺陷 | Drawbacks
未解决问题 | Unresolved questions
No response
您好,我的理解是,token粒度熵计算交叉上更加平滑,也更加容易处理不同样本长度不一的问题
您好,我的理解是,token粒度熵计算交叉上更加平滑,也更加容易处理不同样本长度不一的问题
感谢回复