AutoCompressors
AutoCompressors copied to clipboard
substep & segment
trafficstars
Hello, thanks for the great work! But I wonder why the training procedure is split to several substeps with each substeps has several segments. As from the code, the softprompt is accumulate through each input. So why do we just divide the inputs to several segments and accumulate the loss and softprompt during the forward_segment function and divide the gradient_accumulate_steps in training_step.
Splitting a large batch by gradient_accumulation_steps is a standard feature of the huggingface trainer. Our code additionally accumulates gradients over the segments to save additional memory.