gpt-2-simple
gpt-2-simple copied to clipboard
What exactly happens in one training step?
I'm using gpt-2-simple for model fine-tuning and wonder what exactly happens in one "training step"? Is the entire fine-tuning data fed into the model or is only one unit (i.e. row in my training file) fed?
In a training step, a batch of 1024 tokens (about 2-3 paragraphs) of text is fed into the model, it does a forward pass and the gradients are updated with the backward pass.
If a row of data is less than 2-3 paragraphs, it will receive several continuous rows of data (hence why randomizing the rows of data is recommended)
I had tried to implement training row-by-row when working on aitextgen, but using the implementation here in gpt-2-simple performed much better for whatever reason.