score_sde_pytorch icon indicating copy to clipboard operation
score_sde_pytorch copied to clipboard

Training process for multi-GPUs

Open jaehoon-hahm opened this issue 2 years ago • 5 comments

Hi, I am trying to run training/evaluation with 4 A100s. However, after some experiments I noticed that the training speed was same compared with process trained with a single GPU. Am I missing something?

jaehoon-hahm avatar Feb 21 '23 04:02 jaehoon-hahm

Hello, Jaehoon. I encounter the same problem. I conjecture this is because your Tensorflow package is not installed correctly. I recommend that you should follow the tips provided by https://www.tensorflow.org/install/pip step by step. This maybe helps you solve the problem.

mo666666 avatar Feb 24 '23 05:02 mo666666

However, after solving the above issue, also as a 4*A100 user, I meet the CUDA out of the Memory issue. Do you encounter this issue for the code in this repository?

mo666666 avatar Feb 24 '23 05:02 mo666666

Take a look at this. https://github.com/yang-song/score_sde_pytorch/issues/14#issuecomment-1075887846 I solved CUDA memory issue by adding it to main.py.

jaehoon-hahm avatar Feb 25 '23 02:02 jaehoon-hahm

Ok, thank you very much!

mo666666 avatar Feb 25 '23 04:02 mo666666

Hi, Jaehoon! Does your training speed on 4A100 improve? After re-checking my experiment, I found it is still quite slow: for each GPU, the utilization rate is around 50%. Do you found another trick to accelerate the training speed or can the author @yang-song provide some advice?

mo666666 avatar Feb 26 '23 08:02 mo666666