meshed-memory-transformer
meshed-memory-transformer copied to clipboard
Call scheduler.step() per epoch AND per batch?
In your train_xe function, you call scheduler.step() per epoch, then also call scheduler.step() per batch again. Is this your expected training strategy or an accidental mistake?