private-transformers
private-transformers copied to clipboard
Training on multiple GPUs
I'm re-using the Trainer implemented in examples.classification.src.trainer
. It largely looks like a port of the original Trainer source code but I noticed that has an additional check that stops training when multiple GPUs are available. Specifically:
if self.args.local_rank != -1 or self.args.n_gpu > 1:
raise ValueError("Multi-gpu and distributed training is currently not supported.")
What could go wrong if I comment this out and let the distributed training proceed with torch.nn.DataParallel(model)
? Appreciate the well-written code—thanks for the help in advance.