trl Add gradient accumulation

Add gradient accumulation

Open edbeeching opened this issue 1 year ago • 1 comments

With larger sequences and batches, we quickly go out of memory when the batch size is greater than 1.

Mar 14 '23 08:03 edbeeching

We could probably make use of the accelerate context manager for gradient accumulation!

Mar 14 '23 08:03 lvwerra