glid-3-xl icon indicating copy to clipboard operation
glid-3-xl copied to clipboard

How to use multiple GPUs to finetune the model?

Open cchangyou opened this issue 2 years ago • 6 comments

Hi, if I follow the instruction to run image_train_latent.py, it seems only one GPU is used. Can you advise on how to use multiple GPUs? Thanks.

cchangyou avatar Jun 09 '22 02:06 cchangyou

You can use the mpiexec -n N python command as detailed in this repo: https://github.com/openai/guided-diffusion

limiteinductive avatar Jun 09 '22 08:06 limiteinductive

Got it. Thank you.

cchangyou avatar Jun 09 '22 17:06 cchangyou

@cchangyou Did mpiexec -n N python work in your case? because I tried to use it with multiple gpus but still facing memory error as it is only using GPU 0. image I used: mpiexec -n 4 python

Thanks.

alishan2040 avatar Jun 20 '22 18:06 alishan2040

@alishan2040 The load is not shared among GPUs, you'll need multiple GPUs with enough VRAM each

limiteinductive avatar Jun 20 '22 21:06 limiteinductive

@limiteinductive How much VRAM should be considered enough for a single GPU? Now I've 4 gpus with 16 GB VRAM each. Previously I had single GPU with 24 GBs VRAM. In both the cases, I faced memory errors.

alishan2040 avatar Jun 20 '22 21:06 alishan2040

@alishan2040 I tried only using A100's

limiteinductive avatar Jun 23 '22 16:06 limiteinductive