Collaborative-Diffusion icon indicating copy to clipboard operation
Collaborative-Diffusion copied to clipboard

Inquiry about Training Time and RuntimeError in Diffuser Code

Open Jonyond-lin opened this issue 1 year ago • 2 comments

Hello,

Thank you for your nice job. I recently encountered an issue while running the training code for Diffuser on GitHub, and I would appreciate your guidance.

During training, I encountered the following error:

Diffusion/ldm/models/diffusion/ddpm_compose.py", line 1237, in p_losses logvar_t = self.logvar[t].to(self.device) RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (CPU)

I managed to resolve the issue by moving 't' to the CPU. However, I noticed that the training time for a single epoch is quite long, nearly an hour. I am unsure if this training time is normal or if my actions, such as training on the CPU, are causing the slowdown.

Could you please share your typical training time for a single epoch, so I can better understand if my situation is unusual? Additionally, if you suspect that there may be issues with my setup, I would greatly appreciate any suggestions or solutions you can offer.

Thank you very much for your assistance.

Jonyond-lin avatar Sep 19 '23 10:09 Jonyond-lin