pet
pet copied to clipboard
Missing optimizer step
If max_steps or data length is not divisible by gradient_accumulation_steps some gradients are lost. Since updating only takes place at if (step + 1) % gradient_accumulation_steps == 0:
Hi @FlorisFok, do you have suggestions as to how this should be fixed?
Hi @timoschick, by adding an OR statement to the gradient accumulation if statement. This OR statement could also execute when the loop reaches the final batch.
last_batch = len(train_dataloader) - 1
The modify the following:
if (step + 1) % gradient_accumulation_steps == 0 or last_batch == b_nr:
Where b_nr (batch_number) can be extracted from the first argument coming from the enumerate function. Theoretically, this should use the step
variable already in the script, but this behaves exactly the same as the global_step
. I think that's also a mistake, but that depends on the definition of the two.