torchtune
torchtune copied to clipboard
Make gradient accumulation test stronger
In our current gradient accumulation test, we make the DummyDataset data and labels the same size, therefore we don't need to do anything fancy with computing the number of masked tokens contributing to the loss accumulation.
We should update the test to account for a DummyDataset with data/labels of various sizes.