URL icon indicating copy to clipboard operation
URL copied to clipboard

Inquiry about running on multiple GPUs

Open kdc12345 opened this issue 1 year ago • 3 comments

Hello,

I am currently using your project and am wondering if it is possible to run it on multiple GPUs. Specifically, I am interested in training the model on two GPUs to accelerate the training process.

I have tried to modify the code to support multiple GPUs, but I encountered some errors. Could you please let me know if your project supports multi-GPU training? If so, could you provide some guidance on how to implement it correctly?

Thank you for your help in advance!

Best regards!

kdc12345 avatar Apr 19 '23 08:04 kdc12345

Hi,

Thanks for the question.

We run our code on a single GPU. I think it might be possible for run the model on multiple GPUs but will need some efforts on modifying the code. Which project are you going to run on multiple GPUs? The URL or TSA?

I would recommend to update the code for the network architecture and forward function for enabling the code to run on multiple GPUs.

Best!

WeiHongLee avatar Apr 21 '23 14:04 WeiHongLee

Hi,

Thanks for the quick reply.

I am going to run the URL project with multiple GPUs. It always takes too much time for URL. How long did it take you to complete the URL?

Thank you.

kdc12345 avatar Apr 22 '23 05:04 kdc12345

Hi,

I see. I think it is possible to enable the URL training to run on multiple GPUs with the modification on the network script (resnet) and other related code. Also, the BatchNorm synchronization can be tricky for multi-domain learning over multiple GPUs.

We have spent 48 hours in total for training URL over 8 domains on a single V100 GPU but with early stopping, it will need much less.

Best!

WeiHongLee avatar Apr 25 '23 10:04 WeiHongLee