Make gradient accumulation test stronger

Open joecummings opened this issue 2 years ago • 0 comments

In our current gradient accumulation test, we make the DummyDataset data and labels the same size, therefore we don't need to do anything fancy with computing the number of masked tokens contributing to the loss accumulation.

We should update the test to account for a DummyDataset with data/labels of various sizes.

Mar 05 '24 19:03 joecummings