trl
trl copied to clipboard
Add gradient accumulation
With larger sequences and batches, we quickly go out of memory when the batch size is greater than 1.
We could probably make use of the accelerate
context manager for gradient accumulation!