DeepSpeedExamples
DeepSpeedExamples copied to clipboard
No ignore index
If the label corresponding to the pad token is not set to ignore index,how to avoid calculating losses on pad tokens?