AutoCompressors substep & segment

substep & segment

Open Lu-kuan-lpk opened this issue 1 year ago • 1 comments

trafficstars

Hello, thanks for the great work! But I wonder why the training procedure is split to several substeps with each substeps has several segments. As from the code, the softprompt is accumulate through each input. So why do we just divide the inputs to several segments and accumulate the loss and softprompt during the forward_segment function and divide the gradient_accumulate_steps in training_step.

Jul 03 '24 07:07 Lu-kuan-lpk

Splitting a large batch by gradient_accumulation_steps is a standard feature of the huggingface trainer. Our code additionally accumulates gradients over the segments to save additional memory.

Aug 01 '24 06:08 CodeCreator

AutoCompressors AutoCompressors copied to clipboard

substep & segment

AutoCompressors
AutoCompressors copied to clipboard