score_sde_pytorch Training process for multi-GPUs

Hi, I am trying to run training/evaluation with 4 A100s. However, after some experiments I noticed that the training speed was same compared with process trained with a single GPU. Am I missing something?

Feb 21 '23 04:02 jaehoon-hahm

Hello, Jaehoon. I encounter the same problem. I conjecture this is because your Tensorflow package is not installed correctly. I recommend that you should follow the tips provided by https://www.tensorflow.org/install/pip step by step. This maybe helps you solve the problem.

Feb 24 '23 05:02 mo666666

However, after solving the above issue, also as a 4*A100 user, I meet the CUDA out of the Memory issue. Do you encounter this issue for the code in this repository?

Feb 24 '23 05:02 mo666666

Take a look at this. https://github.com/yang-song/score_sde_pytorch/issues/14#issuecomment-1075887846 I solved CUDA memory issue by adding it to main.py.

Feb 25 '23 02:02 jaehoon-hahm

Ok, thank you very much!

Feb 25 '23 04:02 mo666666

Hi, Jaehoon! Does your training speed on 4A100 improve? After re-checking my experiment, I found it is still quite slow: for each GPU, the utilization rate is around 50%. Do you found another trick to accelerate the training speed or can the author @yang-song provide some advice?

Feb 26 '23 08:02 mo666666