transformer-models
transformer-models copied to clipboard
GPT-2 doesn't include dropout layers
We would like to use these issues to gauge user interest.
The GPT-2 implementation does not include dropout layers. This would be useful for further pre-training and fine-tuning workflows to prevent overfitting.