Liger-Kernel
Liger-Kernel copied to clipboard
Weighted Cross Entropy Loss
trafficstars
🚀 The feature, motivation and pitch
Allow for passing a weighting tensor to weight the CEL similar to C-RLFT where some tokens or inputs in the batch may have lower weighting to the overall loss. see https://arxiv.org/abs/2309.11235
Alternatives
No response
Additional context
No response