Aadyot Bhatnagar
Aadyot Bhatnagar
@dakinggg I've added a unit test that requires sample-based and token-based weighting result in different outcomes when padding is present.
@dakinggg after some further thought, I made one additional change to the composer code. Currently, the total loss is just averaged across all ranks, since the trainer assumes that all...
Thanks @mvpatel2000. For the time being, I've implemented these changes by overriding the `Trainer` class in our local repo, so we will be okay for now. Happy to get further...
@dakinggg bumping this PR for review.