pytorch-transformers-classification icon indicating copy to clipboard operation
pytorch-transformers-classification copied to clipboard

Model performance degrades when moved to Multi-GPU

Open ereday opened this issue 4 years ago • 5 comments

Hi,

When I run your code on multi-gpu, performance degrades severely (compared to the single-gpu version). To make the code multi-gpu competable, I've only added 2 lines of code:

  • model = nn.torch.DataParallel(model) between your model = model_class.from_pretrained(args['model_name']) and model.to(device) calls

  • loss = loss.mean() after the loss = outputs[0] line in the train function. Do you have any idea how can I get the same (or similar) performance on Multi-GPU setting?

These are the results I got with these two settings:

  • With Multi-GPU training: evaluate_loss: = 0.3928874781464829 fn = 116 fp = 81 mcc = 0.5114751200090137 tn = 1291 tp = 136

  • With Single-GPU Training: evaluate_loss: = 0.39542119007776766 fn = 82 fp = 126 mcc = 0.5465463104769824 tn = 1246 tp = 170

Although avg loss values are similar, there are big differences in other metrics.

ereday avatar Nov 08 '19 13:11 ereday

Those changes should be sufficient to enable multi-gpu training in my experience. Is there any other difference (e.g. batch size) between the two runs?

ThilinaRajapakse avatar Nov 08 '19 13:11 ThilinaRajapakse

Nope, I did not change any of the variables in args dictionary.

ereday avatar Nov 08 '19 13:11 ereday

This is probably a silly question, but did you try this multiple times and receive the same results?

ThilinaRajapakse avatar Nov 08 '19 13:11 ThilinaRajapakse

Yes, I run the code with the same configuration multiples times. There is no difference across different runs.

ereday avatar Nov 08 '19 13:11 ereday

Sorry, I am not sure why this is happening. I recommend that you try the Simple Transformers library as it supports multi-gpu training by default and I have used multi-gpu training with that library without any performance degradation.

ThilinaRajapakse avatar Nov 09 '19 10:11 ThilinaRajapakse