gpt-2-simple What exactly happens in one training step?

What exactly happens in one training step?

Open bjoernhommel opened this issue 4 years ago • 1 comments

I'm using gpt-2-simple for model fine-tuning and wonder what exactly happens in one "training step"? Is the entire fine-tuning data fed into the model or is only one unit (i.e. row in my training file) fed?

Dec 15 '20 08:12 bjoernhommel

In a training step, a batch of 1024 tokens (about 2-3 paragraphs) of text is fed into the model, it does a forward pass and the gradients are updated with the backward pass.

If a row of data is less than 2-3 paragraphs, it will receive several continuous rows of data (hence why randomizing the rows of data is recommended)

I had tried to implement training row-by-row when working on aitextgen, but using the implementation here in gpt-2-simple performed much better for whatever reason.

Dec 24 '20 04:12 minimaxir

gpt-2-simple gpt-2-simple copied to clipboard

What exactly happens in one training step?

gpt-2-simple
gpt-2-simple copied to clipboard