Multi-GPU is broken
2x3090 instance on Runpod using the Runpod notebook on their Stable Diffusion image. I can train on GPU 0, but not 0 and 1 together or even separately. Running on GPU 0 works fine.
Here is what happens when I am training on GPU 0, and try to start a separate training on GPU 1. It seems GPU 0 is hardcoded somewhere.
!python "main.py" \
--base configs/stable-diffusion/v1-finetune_unfrozen.yaml \
-t \
--actual_resume "model.ckpt" \
--reg_data_root "{reg_data_root}" \
-n "{project_name}" \
--gpus 1, \
--data_root "/workspace/Dreambooth-Stable-Diffusion/MS" \
--max_training_steps {max_training_steps} \
--class_word "{class_word}" \
--token "{token}" \
--no-test
.....
Traceback (most recent call last):
File "main.py", line 665, in <module>
model = load_model_from_config(config, opt.actual_resume)
File "main.py", line 42, in load_model_from_config
model.cuda()
File "/venv/lib/python3.8/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 138, in cuda
return super().cuda(device=device)
File "/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 688, in cuda
return self._apply(lambda t: t.cuda(device))
File "/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply
module._apply(fn)
File "/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply
module._apply(fn)
File "/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 601, in _apply
param_applied = fn(param)
File "/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 688, in <lambda>
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 23.70 GiB total capacity; 0 bytes already allocated; 13.56 MiB free; 0 bytes reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 883, in <module>
if trainer.global_rank == 0:
NameError: name 'trainer' is not defined
Something like this would fix it, no? Pass in gpuinfo when it is called in main.py.
def load_model_from_config(config, gpuinfo, ckpt, verbose=False):
print(f"Loading model from {ckpt}")
pl_sd = torch.load(ckpt, map_location="cpu")
sd = pl_sd["state_dict"]
config.model.params.ckpt_path = ckpt
model = instantiate_from_config(config.model)
m, u = model.load_state_dict(sd, strict=False)
if len(m) > 0 and verbose:
print("missing keys:")
print(m)
if len(u) > 0 and verbose:
print("unexpected keys:")
print(u)
device = torch.device("cuda:" + str(gpuinfo).rstrip(",")) if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
model.eval()
return model
model.cuda() tries to call on GPU 0.
Is multi GPU supposed to be supported?
Potentially related (NameError: name 'trainer' is not defined):
- https://github.com/JoePenna/Dreambooth-Stable-Diffusion/issues/28
- https://github.com/JoePenna/Dreambooth-Stable-Diffusion/issues/53
- https://github.com/JoePenna/Dreambooth-Stable-Diffusion/issues/86
- https://github.com/JoePenna/Dreambooth-Stable-Diffusion/issues/87
I went to line 896 on Main.py, and changed "trainer" for "Trainer", now it's working
Originally posted by @Pegaxsus in https://github.com/JoePenna/Dreambooth-Stable-Diffusion/issues/86#issuecomment-1295861499
Something like this would fix it, no? Pass in gpuinfo when it is called in main.py.
The following seems to give a solution:
- https://datascience.stackexchange.com/questions/54907/model-cuda-in-pytorch
model.cuda()by default will send your model to the "current device", which can be set withtorch.cuda.set_device(device).An alternative way to send the model to a specific device is
model.to(torch.device('cuda:0')).This, of course, is subject to the device visibility specified in the environment variable
CUDA_VISIBLE_DEVICES.You can check GPU usage with nvidia-smi. Also, nvtop is very nice for this.
The standard way in PyTorch to train a model in multiple GPUs is to use nn.DataParallel which copies the model to the GPUs and during training splits the batch among them and combines the individual outputs.
Following in the hope this get supported :)
No plans to support this, PR's welcome though if you can figure it out