composer eval_batch

eval_batch_size=auto

Open mvpatel2000 opened this issue 2 years ago • 1 comments

Aug 17 '22 00:08 mvpatel2000

Oops meant to open draft PR -- not ready for review yet!

Aug 17 '22 00:08 mvpatel2000

I think this works -- tested on resnet with large batch size, but I'd love to get some help testing this on more workloads. I honestly don't have the bandwidth to test this across a good amount of workloads, so I'm going to leave this to people who've been requesting this feature @moinnadeem @abhi-mosaic.

Otherwise, I'll get to this sometime mid-late next week (OOO for rest of this week)

Aug 17 '22 19:08 mvpatel2000

Is this ready for review?

Aug 17 '22 19:08 dskhudia

Is this ready for review?

Yes! :)

Aug 17 '22 19:08 mvpatel2000

Do we clear the already computed gradients if we run out of memory while backward is partially executed? PyTorch will do += for such already computed gradients on retry.

Aug 18 '22 17:08 dskhudia

composer composer copied to clipboard

eval_batch_size=auto

composer
composer copied to clipboard