gpt-2-Pytorch
gpt-2-Pytorch copied to clipboard
How to train/fine tune the model with multiple GPUs?
I have pulled the code from branch train. Is there a way to train or fine tune the GPT-2 model with data parallelism on multiple GPUs? Thanks for your help.